[android-developers] Re: SAXParser fails on some RSS feeds

Tim Bray Sat, 28 Feb 2009 20:20:07 -0800

On Sat, Feb 28, 2009 at 5:53 PM, grennis <[email protected]> wrote:
>
> I'm using the SAX parser to read some RSS feeds and have found a
> problem


In general you can't use a real XML processor, which the java SAX
stuff is, to read RSS feeds.  Lots and lots of them aren't XML at all.
 Atom 1.0 is better, but lots of feeds aren't Atom.  Once somebody
ports either Jython or JRuby and gets it really running, the problem
is solved because you can use the excellent Feedparser library, which
Just Works on any imaginable feed.  In the interim, you might want to
consider John Cowan's excellent TagSoup, which handles what its name
suggests. Libxml2 also has a "forgiving" parser but I don't know if
there's a Java interface to that. -T

. Some feeds, for example CNN Money Top Stories, have embedded
> some characters in their content, I.e. the copyright symbol. Well,
> that's not valid XML and the SAXParser fails with an exception
> "invalid token".
>
> The only help I have seen given is to fix the XML at the source and
> that's not an option obviously. So, I can think of 2 options and they
> both stink: (a) read the content first, scrub it, and then pass it to
> the parser. (B) use DOM instead of SAX.
>
> What I *want* to do is make the parser a little more forgiving and
> just accept or discard/ignore the bad text. I'm not have any luck with
> setErrorHandler. My error handler does not get called.
>
> Can anyone offer some help on this? Thanks
>
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

[android-developers] Re: SAXParser fails on some RSS feeds

Reply via email to