There have been a few similar questions. The basic issue is that SAX parsers requires valid XML or XHTML as input. If you have control (or can influence the authors of) the service, make the output valid, which as you well know, means that <, >, ", ', & need to be escaped. In PHP, this is easily done with the htmlspecialchars function. Tip: use validator.w3.org to see what's wrong with the document.
If you can't change the service or it is HTML anyway, here are some suggestions: 1) Use NekoHtml to preprocess the the flakey markup into a DOM. You can then user SAXParser, XPath, XSLT, etc. to get the data. I haven't tried it on Andorid - it may be a bit heavy-weight, but otherwise is a great way to deal with flakey markup. 2) See if you can modify the SAXParser itself (can you say Open Source?) to relax the particular issues. If the source docuemnt is really bad (unbalanced tags, etc.) this is probably going to get too hairy. 3) Use regex to parse the page. There are probably some other creative solutions. Which one is best depends on the details of the source document and what you want want to do with it. On Fri, Jan 1, 2010 at 6:12 AM, tlegras <[email protected]> wrote: > Happy new years :) > > I am using SAXParser to parse an html page (any better solution?) and > have this exception: > > W/System.err( 1358): org.apache.harmony.xml.ExpatParser > $ParseException: At line 1, column 59: not well-formed (invalid token) > > I have reduced the page to this: > > <div id="submenu"><a href="/compte/console.pl? > id=382730&idt=1cf6b94aa1a4cf84"></a></div> > > and what causes the exception is the '&' inside the href attribute > value. > > Here is a minimalist test code: > > DefaultHandler emptySaxHandler = new DefaultHandler() {}; > String xmlstr = "<div id=\"submenu\"><a href=\"/compte/ > console.pl?id=382730&idt=1cf6b94aa1a4cf84<http://console.pl/?id=382730&idt=1cf6b94aa1a4cf84> > \"></a></div>"; > > SAXParserFactory factory = SAXParserFactory.newInstance(); > SAXParser saxParser = factory.newSAXParser(); > saxParser.parse(new ByteArrayInputStream(xmlstr.getBytes > ()),emptySaxHandler); > > is this a normal behaviour or kind of bug? if normal, what should do > to preprocess the string before parsing? > > Thks for any help, > Thierry. > > -- > You received this message because you are subscribed to the Google > Groups "Android Developers" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected]<android-developers%[email protected]> > For more options, visit this group at > http://groups.google.com/group/android-developers?hl=en -- You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/android-developers?hl=en

