I'm working on the same problem right now.  I'll take a look at
TagSoup.  Otherwise, I was just thinking of scrubbing out the invalid
tokens before sending it to the xml reader.  Please let me know what
you find/ decide to do.


On Feb 28, 8:19 pm, Tim Bray <[email protected]> wrote:
> On Sat, Feb 28, 2009 at 5:53 PM, grennis <[email protected]> wrote:
>
> > I'm using the SAX parser to read some RSS feeds and have found a
> > problem
>
> In general you can't use a real XML processor, which the java SAX
> stuff is, to read RSS feeds.  Lots and lots of them aren't XML at all.
>  Atom 1.0 is better, but lots of feeds aren't Atom.  Once somebody
> ports either Jython or JRuby and gets it really running, the problem
> is solved because you can use the excellent Feedparser library, which
> Just Works on any imaginable feed.  In the interim, you might want to
> consider John Cowan's excellent TagSoup, which handles what its name
> suggests. Libxml2 also has a "forgiving" parser but I don't know if
> there's a Java interface to that. -T
>
> . Some feeds, for example CNN Money Top Stories, have embedded
>
> > some characters in their content, I.e. the copyright symbol. Well,
> > that's not valid XML and the SAXParser fails with an exception
> > "invalid token".
>
> > The only help I have seen given is to fix the XML at the source and
> > that's not an option obviously. So, I can think of 2 options and they
> > both stink: (a) read the content first, scrub it, and then pass it to
> > the parser. (B) use DOM instead of SAX.
>
> > What I *want* to do is make the parser a little more forgiving and
> > just accept or discard/ignore the bad text. I'm not have any luck with
> > setErrorHandler. My error handler does not get called.
>
> > Can anyone offer some help on this? Thanks
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to