Re: [android-developers] Parsing HTML

Marc Petit-Huguenin Thu, 28 Jan 2010 15:04:57 -0800

On 01/28/2010 02:31 PM, Allison Inouye wrote:
> I am trying to parse an HTML document that is missing an end tag on
> one of the elements (input tag). Anyone know how to get the parser to
> ignore that it doesn't have an end tag and just read an attribute
> value?
> 
> DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
> DocumentBuilder builder = dbf.newDocumentBuilder();
> Document dom = builder.parse(url.openStream()); //ERROR HERE
> 
> Error:
> 01-28 21:34:38.384: WARN/System.err(12108):
> org.xml.sax.SAXParseException: expected: /input read: div
> (position:END_TAG </div>@21:10 in java.io.inputstreamrea...@432749f8)
>


I was able to parse badly written HTML (is there another kind?) as XML by using
JTidy (not on Android so YMMV):

import org.w3c.tidy.*;

Tidy tidy = new Tidy();
tidy.setXmlOut(true);
tidy.setShowWarnings(false);
tidy.setQuiet(true);
tidy.parseDOM(connection.getInputStream(), null);

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

Re: [android-developers] Parsing HTML

Reply via email to