Tonight something incredible happened to me. You won't believe it. I was walking
back from the pubs when I got snapped by a passing space ships full
of hyper advanced aliens. They did various experiments on me, and cloned me 1000
times. It is terrible. I just don't know what to do.


I suppose that I means am +1000 on this now.

:-) That's consensus, I am sure.

Henry
http://bblfish.net/blog/

On 5 May 2005, at 06:02, Tim Bray wrote:
<co-chair-hat status="OFF">

http://www.intertwingly.net/wiki/pie/PaceAllowDuplicateIDs

This Pace was motivated by a talk I had with Bob Wyman today about the problems the synthofeed-generator community has.

Summary:
1. There are multiple plausible use-cases for feeds with duplicate IDs
2. Pro and Contra
3. Alternate Paces
4. Details about this Pace

1. Use-Cases

Here's a stream of stock-market quotes.

<feed><title>My Portfolio</title>
 ....
 <entry><title>MSFT</title>
  <updated>2005-05-03T10:00:00-05:00</updated>
  <content>Bid: 25.20 Ask: 25.50 Last: 25.20</content></item>
  </entry>
 <entry><title>MSFT</title>
  <updated>2005-05-03T11:00:00-05:00</updated>
  <content>Bid: 25.15 Ask: 25.25 Last: 25.20</content></item>
  </entry>
 <entry><title>MSFT</title>
  <updated>2005-05-03T12:00:00-05:00</updated>
  <content>Bid: 25.10 Ask: 25.15 Last: 25.10</content></item>
  </entry>
</feed>

You could also imagine a stream of weather readings. Bob's actual here-and-now today use-case from PubSub is earthquakes, an entry describes an earthquake and they keep re-issuing it as new info about strength/location comes in.

Some people only care about the most recent version of the entry, others might want to see all of them. Basically, each atom:entry element describes the same Entry, only at a different point in time.

You could argue that in some cases, these are representations of the Web resources identified by the atom:id URI, but I don't think we need to say that explicitly.

Yes, you could think of alternate ways of representing stock quotes or any of the other use-cases but this is simple and direct and idiomatic.

2. Pro and Contra

Given that I issued the consensus call rejecting the last attempt to do this, which was PaceRepeatIdInDocument, I felt nervous about revisiting the issue. So I went and reviewed the discussion around that one, which I extracted and placed at http://www.tbray.org/tmp/ RepeatID.txt for the WG's convenience.

Reviewing that discussion, I'm actually not impressed. There were a few -1's but very few actual technical arguments about why this shouldn't be done. The most common was "Software will screw this up". On reflection, I don't believe that. You have a bunch of Entries, some of them have the same ID and are distinguished by datestamp. Some software will show the latest, some will show all of them, the good software will allow switching back and forth. Doesn't seem like rocket science to me.

So here's how I see it: there are plausible use cases for doing this, and one of the leading really large-scale implementors in the space (PubSub) wants to do this right now. Bob's been making strong claims about not being able to use Atom if this restriction remains in place.

I believe strongly that if there's something that implementors want to do, standards shouldn't get in the way unless there's real interoperability damage. I'm certainly prepared to believe that this could cause interoperability damage, but to date I haven't seen any convincing arguments that it will. I think that if we nonetheless forbid it, people who want to do this will (a) use RSS instead of Atom, (b) cook up horrible kludges, or (c) ignore us and just do it.

So my best estimate is that the cost of allowing dupes is probably much lower than the cost of forbidding them.

Finally, our charter does say that we're also supposed to specify how you'd go about archiving feeds, and AllowDuplicateIDs makes this trivial. I looked around and failed to find how we claimed we were going to do that while still forbidding duplicates, but it's possible I missed that.

3. Alternate Paces

I didn't want to just revive PaceRepeatIdInDocument, because it used the word "version" in what I thought was kind of a sloppy way, and because it wasn't current against format-08. I don't like either PaceDuplicateIDWithSource or ...WithSource2, they are complicated and don't really meet PubSub's needs anyhow. So I'm strongly -1 on both of those. Yes, that means that if this Pace fails, we'll allow no duplicates at all. I prefer either "dupes OK" or "no dupes" to "dupes OK in the following circumstances"; cleaner.

4. Details

Section 4.1.2 of format-08 says that atom:entry "represents an individual entry". The Pace says that if you have dupes, they "represent the same entry", which I think is consistent with both the letter and spirit of 4.1.2.

The Pace discourages duplicate timestamps without resorting to MUST language, because accidents can happen; this allows software to throw such entries on the floor while positively encouraging noisy complaining. On the other hand, if the WG wanted either to insist on a MUST here or remove the discouragement altogether I could live with that.

Finally, it makes it clear that if there are entries with duplicate atom:id, software is free to display all or a subset, and calls out the likely common case where you discard all but the most recent. If I were Brent Simmons or equivalent, I'd be coding up a button where you can arrange to show them all or just one.

</co-chair-hat>




Reply via email to