You can take a look at some projects like:
* JavaCC HTML Parser (http://www.quiotix.com/downloads/html-parser/)
* HEX - The HTML Enabled XML Parser
(http://www-uk.hpl.hp.com/people/sth/java/hex.html)

Rgds,
Neeme

-----Original Message-----
From: Jaquiss, Robert [mailto:[EMAIL PROTECTED]]
Sent: Friday, November 16, 2001 10:44 PM
To: [EMAIL PROTECTED]
Subject: Looking for tools/ideas for filtering HTML


Hello:

     I have just joined this list, and am also a beginning Java programmer.
I appologize if this is not a suitable question for this list. I need to
write a filter for HTML pages. My goal is to read an HTML page, throwing
away all the HTML code and just keeping a block of text that occurs near the
bottom of the page. The HTML tags are liable to be unbalanced. There will be
a <P> but no </P>. I found a sample program that used the SAXparser, but it
SAXparser doesn't seem to handle unbalanced tags. Ideas/comments would be
appreciated.  Thank you.

    Regards
   Robert Jaquiss


---------------------------------------------------------------------
In case of troubles, e-mail:     [EMAIL PROTECTED]
To unsubscribe, e-mail:          [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to