On Sun, Mar 1, 2009 at 12:48 AM, 3D <[email protected]> wrote: > > I'm working on the same problem right now. I'll take a look at > TagSoup. Otherwise, I was just thinking of scrubbing out the invalid > tokens before sending it to the xml reader. Please let me know what > you find/ decide to do.
Scrubbing it will almost certainly not work. There is some seriously weird shit in RSS feeds out there. Not just wonky characters. The reason is that most blog authoring systems let you grab arbitrary claims-to-be-html off the web and drop it into your blog, so it ends up in your feed, and even with the double-escaping voodoo you see in RSS, the poison remains. As an interim step, you could simply take Atom when there's a choice of feeds, and refuse to process bad RSS. The proportion of feeds that have Atom alternatives available is pretty high. The reason this works is that one or two of the leading feed-readers decided to use real persnickety XML parsers for Atom, so the publishing industry has done the necessary whatevers to make sure they're clean. The *right* answer is FeedParser, sigh. -Tim --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Android Developers" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/android-developers?hl=en -~----------~----~----~----~------~----~------~--~---

