and this: http://www.scrml.org
>You can take a look at some projects like: >* JavaCC HTML Parser (http://www.quiotix.com/downloads/html-parser/) >* HEX - The HTML Enabled XML Parser >(http://www-uk.hpl.hp.com/people/sth/java/hex.html) > >Rgds, >Neeme > >-----Original Message----- >From: Jaquiss, Robert [mailto:[EMAIL PROTECTED]] >Sent: Friday, November 16, 2001 10:44 PM >To: [EMAIL PROTECTED] >Subject: Looking for tools/ideas for filtering HTML > >Hello: > > I have just joined this list, and am also a beginning Java programmer. >I appologize if this is not a suitable question for this list. I need to >write a filter for HTML pages. My goal is to read an HTML page, throwing >away all the HTML code and just keeping a block of text that occurs near the >bottom of the page. The HTML tags are liable to be unbalanced. There will be >a <P> but no </P>. I found a sample program that used the SAXparser, but it >SAXparser doesn't seem to handle unbalanced tags. Ideas/comments would be >appreciated. Thank you. > > Regards > Robert Jaquiss > > >--------------------------------------------------------------------- >In case of troubles, e-mail: [EMAIL PROTECTED] >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] -- ------------------------------ Max Guglielmino Corrosive http://www.corrosive.co.uk --------------------------------------------------------------------- In case of troubles, e-mail: [EMAIL PROTECTED] To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]