Looking for tools/ideas for filtering HTML

Jaquiss, Robert Fri, 16 Nov 2001 12:43:04 -0800

Hello:

I have just joined this list, and am also a beginning Java programmer. I appologize if this is not a suitable question for this list. I need to write a filter for HTML pages. My goal is to read an HTML page, throwing away all the HTML code and just keeping a block of text that occurs near the bottom of the page. The HTML tags are liable to be unbalanced. There will be a <P> but no </P>. I found a sample program that used the SAXparser, but it SAXparser doesn't seem to handle unbalanced tags. Ideas/comments would be appreciated. Thank you.

Regards

Robert Jaquiss

Looking for tools/ideas for filtering HTML

Reply via email to