Re: One reason we have duplicates entries is that we have duplicate feeds...

Tim Bray Thu, 07 Apr 2005 11:58:22 -0700


On Apr 7, 2005, at 10:34 AM, Bob Wyman wrote:

Tim suggests that aggregators should be able to rely simply on atom:id to detect duplicates. However, as has often been pointed out, applying this rule in an intermediary like PubSub would simply make PubSub a marvelously efficient tool for denial of service attacks. I.e. if I didn't like something you published, I would simply publish something in my blog that had the same atom:id as something you had published. PubSub and other synthetic feed producers would then flush your post from the system and replace it with my post... Not good -- and not avoidable given the current loose rules for defining instances of atom:id.


Yes.  Mea culpa; somehow I'd missed this.  Would the following work:

1. A new feed-level element <atom:alt-uri-prefix>, any number allowed. E.g.

<feed>
  <link>http://www.tbray.org/ongoing/</link>
  <alt-uri-prefix>http://www.tbray.org/</alt-uri-prefix>
  <alt-uri-prefix>http://tbray.org/</alt-uri-prefix>
  <alt-uri-prefix>http://www.textuality.com</alt-uri-prefix

It says: an atom:entry can't be a duplicate of an atom:entry in this feed unless it comes from a feed whose URI begins with one of these. In conjunction with atom:id and atom:updated, it would solve most of PubSub's problem, but note that it only solves the problem for one level of aggregation. Which I suspect is a useful 80/20 point.

By the way... bear in mind that virtually all duplicates are coming via things like Technorati and PubSub, so if we can figure out a fix, the number of players who need to implement it is small, and the motivation for them to do it is high. -Tim

Re: One reason we have duplicates entries is that we have duplicate feeds...

Reply via email to