You can take a look at some projects like: * JavaCC HTML Parser (http://www.quiotix.com/downloads/html-parser/) * HEX - The HTML Enabled XML Parser (http://www-uk.hpl.hp.com/people/sth/java/hex.html)
Rgds, Neeme -----Original Message----- From: Jaquiss, Robert [mailto:[EMAIL PROTECTED]] Sent: Friday, November 16, 2001 10:44 PM To: [EMAIL PROTECTED] Subject: Looking for tools/ideas for filtering HTML Hello: I have just joined this list, and am also a beginning Java programmer. I appologize if this is not a suitable question for this list. I need to write a filter for HTML pages. My goal is to read an HTML page, throwing away all the HTML code and just keeping a block of text that occurs near the bottom of the page. The HTML tags are liable to be unbalanced. There will be a <P> but no </P>. I found a sample program that used the SAXparser, but it SAXparser doesn't seem to handle unbalanced tags. Ideas/comments would be appreciated. Thank you. Regards Robert Jaquiss --------------------------------------------------------------------- In case of troubles, e-mail: [EMAIL PROTECTED] To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]