On Aug 12, 2012, at 4:34 AM, Christian Kienle <uin....@googlemail.com> wrote:

> I reported these assertions years ago but did get no response.

Disclaimer: I wrote about ⅔ of the PubSub framework.
After I left Apple at the end of 2007 I don't think anyone else put any work 
into the framework.

> If I had to do it again I would simply use the NSXML* classes to write my own 
> Atom/RSS parser. It should not be that hard and you can certainly do a better 
> job as the PubSub team did.

*hollow laugh*

Fetching feeds is _hard_, at least if you want to fetch arbitrary feeds. RSS is 
a horribly vague file format, and many of the people implementing feeds on 
websites seem to just dump content into PHP templates instead of using any real 
XML API, so you run into lots and lots of problems like
— There are at least 9 (by Mark Pilgrim's count) different published dialects 
of RSS and Atom
— There are various metadata extensions like Dublin Core you have to be aware of
— Some feeds are malformed XML and have to be cleaned up before they can be 
parsed at all
— There are so very many different date formats people use. I think we ran into 
at least 20. You need either a very smart custom date parser or just a list of 
20+ formats to attempt to parse, one after another. Oh, and time zones and DST 
are super fun to deal with.
— A lot of article bodies contain malformed HTML, often just tag soup that some 
blogger typed in by hand
— Many feeds have problems with quoting in headlines or article bodies, which 
requires using heuristics to figure out whether or not they really meant 
"&amp;" or an ampersand
— Likewise for whether articles are HTML or plain text
— Uniquing items between fetches can be challenging, especially for older feed 
formats that don't have article UUIDs or permalinks.
— Remember to use conditional GETs or websites will get mad at you for spamming 
their feeds

Now, if there's only one feed that your application needs to fetch, and if 
you're responsible for creating that feed on the server side (or can influence 
the people who are), a lot of these problems go away because you can can 
enforce that everything is using the correct formats. But we weren't so lucky.

—Jens
_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to