On Apr 7, 2005, at 10:34 AM, Bob Wyman wrote:
Tim suggests that aggregators should be able to rely simply on
atom:id to detect duplicates. However, as has often been pointed out,
applying this rule in an intermediary like PubSub would simply make PubSub a
marvelously efficient tool for denial of service attacks. I.e. if I didn't
like something you published, I would simply publish something in my blog
that had the same atom:id as something you had published. PubSub and other
synthetic feed producers would then flush your post from the system and
replace it with my post... Not good -- and not avoidable given the current
loose rules for defining instances of atom:id.
Yes. Mea culpa; somehow I'd missed this. Would the following work:
1. A new feed-level element <atom:alt-uri-prefix>, any number allowed. E.g.
<feed> <link>http://www.tbray.org/ongoing/</link> <alt-uri-prefix>http://www.tbray.org/</alt-uri-prefix> <alt-uri-prefix>http://tbray.org/</alt-uri-prefix> <alt-uri-prefix>http://www.textuality.com</alt-uri-prefix
It says: an atom:entry can't be a duplicate of an atom:entry in this feed unless it comes from a feed whose URI begins with one of these. In conjunction with atom:id and atom:updated, it would solve most of PubSub's problem, but note that it only solves the problem for one level of aggregation. Which I suspect is a useful 80/20 point.
By the way... bear in mind that virtually all duplicates are coming via things like Technorati and PubSub, so if we can figure out a fix, the number of players who need to implement it is small, and the motivation for them to do it is high. -Tim
