On Sat, Feb 28, 2009 at 5:53 PM, grennis <[email protected]> wrote: > > I'm using the SAX parser to read some RSS feeds and have found a > problem
In general you can't use a real XML processor, which the java SAX stuff is, to read RSS feeds. Lots and lots of them aren't XML at all. Atom 1.0 is better, but lots of feeds aren't Atom. Once somebody ports either Jython or JRuby and gets it really running, the problem is solved because you can use the excellent Feedparser library, which Just Works on any imaginable feed. In the interim, you might want to consider John Cowan's excellent TagSoup, which handles what its name suggests. Libxml2 also has a "forgiving" parser but I don't know if there's a Java interface to that. -T . Some feeds, for example CNN Money Top Stories, have embedded > some characters in their content, I.e. the copyright symbol. Well, > that's not valid XML and the SAXParser fails with an exception > "invalid token". > > The only help I have seen given is to fix the XML at the source and > that's not an option obviously. So, I can think of 2 options and they > both stink: (a) read the content first, scrub it, and then pass it to > the parser. (B) use DOM instead of SAX. > > What I *want* to do is make the parser a little more forgiving and > just accept or discard/ignore the bad text. I'm not have any luck with > setErrorHandler. My error handler does not get called. > > Can anyone offer some help on this? Thanks > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/android-developers?hl=en -~----------~----~----~----~------~----~------~--~---

