On 01/28/2010 02:31 PM, Allison Inouye wrote: > I am trying to parse an HTML document that is missing an end tag on > one of the elements (input tag). Anyone know how to get the parser to > ignore that it doesn't have an end tag and just read an attribute > value? > > DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); > DocumentBuilder builder = dbf.newDocumentBuilder(); > Document dom = builder.parse(url.openStream()); //ERROR HERE > > Error: > 01-28 21:34:38.384: WARN/System.err(12108): > org.xml.sax.SAXParseException: expected: /input read: div > (position:END_TAG </div>@21:10 in java.io.inputstreamrea...@432749f8) >
I was able to parse badly written HTML (is there another kind?) as XML by using JTidy (not on Android so YMMV): import org.w3c.tidy.*; Tidy tidy = new Tidy(); tidy.setXmlOut(true); tidy.setShowWarnings(false); tidy.setQuiet(true); tidy.parseDOM(connection.getInputStream(), null); -- You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/android-developers?hl=en

