On Sun, Mar 1, 2009 at 12:48 AM, 3D <[email protected]> wrote:
>
> I'm working on the same problem right now.  I'll take a look at
> TagSoup.  Otherwise, I was just thinking of scrubbing out the invalid
> tokens before sending it to the xml reader.  Please let me know what
> you find/ decide to do.

Scrubbing it will almost certainly not work.  There is some seriously
weird shit in RSS feeds out there.  Not just wonky characters.  The
reason is that most blog authoring systems let you grab arbitrary
claims-to-be-html off the web and drop it into your blog, so it ends
up in your feed, and even with the double-escaping voodoo you see in
RSS, the poison remains.

As an interim step, you could simply take Atom when there's a choice
of feeds, and refuse to process bad RSS.  The proportion of feeds that
have Atom alternatives available is pretty high.  The reason this
works is that one or two of the leading feed-readers decided to use
real persnickety XML parsers for Atom, so the publishing industry has
done the necessary whatevers to make sure they're clean.

The *right* answer is FeedParser, sigh.  -Tim

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google
Groups "Android Developers" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to