I'm working on the same problem right now. I'll take a look at TagSoup. Otherwise, I was just thinking of scrubbing out the invalid tokens before sending it to the xml reader. Please let me know what you find/ decide to do.
On Feb 28, 8:19 pm, Tim Bray <[email protected]> wrote: > On Sat, Feb 28, 2009 at 5:53 PM, grennis <[email protected]> wrote: > > > I'm using the SAX parser to read some RSS feeds and have found a > > problem > > In general you can't use a real XML processor, which the java SAX > stuff is, to read RSS feeds. Lots and lots of them aren't XML at all. > Atom 1.0 is better, but lots of feeds aren't Atom. Once somebody > ports either Jython or JRuby and gets it really running, the problem > is solved because you can use the excellent Feedparser library, which > Just Works on any imaginable feed. In the interim, you might want to > consider John Cowan's excellent TagSoup, which handles what its name > suggests. Libxml2 also has a "forgiving" parser but I don't know if > there's a Java interface to that. -T > > . Some feeds, for example CNN Money Top Stories, have embedded > > > some characters in their content, I.e. the copyright symbol. Well, > > that's not valid XML and the SAXParser fails with an exception > > "invalid token". > > > The only help I have seen given is to fix the XML at the source and > > that's not an option obviously. So, I can think of 2 options and they > > both stink: (a) read the content first, scrub it, and then pass it to > > the parser. (B) use DOM instead of SAX. > > > What I *want* to do is make the parser a little more forgiving and > > just accept or discard/ignore the bad text. I'm not have any luck with > > setErrorHandler. My error handler does not get called. > > > Can anyone offer some help on this? Thanks --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/android-developers?hl=en -~----------~----~----~----~------~----~------~--~---

