There have been a few similar questions. The basic issue is that SAX parsers
requires valid XML or XHTML as input. If you have control (or can influence
the authors of) the service, make the output valid, which as you well know,
means that <, >, ", ', & need to be escaped. In PHP, this is easily done
with the htmlspecialchars function. Tip: use validator.w3.org to see what's
wrong with the document.

If you can't change the service or it is HTML anyway, here are some
suggestions:

1) Use NekoHtml to preprocess the the flakey markup into a DOM. You can then
user SAXParser, XPath, XSLT, etc. to get the data. I haven't tried it on
Andorid - it may be a bit heavy-weight, but otherwise is a great way to deal
with flakey markup.
2) See if you can modify the SAXParser itself (can you say Open Source?) to
relax the particular issues. If the source docuemnt is really bad
(unbalanced tags, etc.) this is probably going to get too hairy.
3) Use regex to parse the page.

There are probably some other creative solutions. Which one is best depends
on the details of the source document and what you want want to do with it.

On Fri, Jan 1, 2010 at 6:12 AM, tlegras <[email protected]> wrote:

> Happy new years :)
>
> I am using SAXParser to parse an html page (any better solution?) and
> have this exception:
>
>            W/System.err( 1358): org.apache.harmony.xml.ExpatParser
> $ParseException: At line 1, column 59: not well-formed (invalid token)
>
> I have reduced the page to this:
>
>            <div id="submenu"><a href="/compte/console.pl?
> id=382730&idt=1cf6b94aa1a4cf84"></a></div>
>
> and what causes the exception is the '&' inside the href attribute
> value.
>
> Here is a minimalist test code:
>
>            DefaultHandler emptySaxHandler = new DefaultHandler() {};
>            String xmlstr = "<div id=\"submenu\"><a href=\"/compte/
> console.pl?id=382730&idt=1cf6b94aa1a4cf84<http://console.pl/?id=382730&idt=1cf6b94aa1a4cf84>
> \"></a></div>";
>
>            SAXParserFactory factory = SAXParserFactory.newInstance();
>            SAXParser saxParser = factory.newSAXParser();
>            saxParser.parse(new ByteArrayInputStream(xmlstr.getBytes
> ()),emptySaxHandler);
>
> is this a normal behaviour or kind of bug? if normal, what should do
> to preprocess the string before parsing?
>
> Thks for any help,
> Thierry.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Android Developers" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]<android-developers%[email protected]>
> For more options, visit this group at
> http://groups.google.com/group/android-developers?hl=en

-- 
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

Reply via email to