Re: PaceAllowDuplicateIDs

Bill de hÓra Fri, 06 May 2005 02:25:52 -0700

Dave Johnson wrote:

Immediately after sending this message, I had a rush  of second thoughts.
My point #2 is not very well thought out. I think it applies for things like earthquake data, but when Atom feeds represent blog entries or articles (in an archive or an Atom Protocol feed) the ID represents the article not an event in the blog entry's life. So, you can discount my second reason against the pace.

Good, because not everyone would agree that's what's being modeled. Now I also think your 1) starts to fall away as well because 2) no longer holds. That is, some of the argument for "Current best practice" you're talking about (you sure about "best" in there? ;) is predicated on entries being like events. Aggregators tend, or would like, to treat entries as singular happenings. Finally some events are not easily modeled in the discrete way you're talking about (in turn some of that comes down to how you model time), I don't think we have to worry about those here.

Put it this way, under either "this is an event stream" or "this is a stream of entries", having multiple entries in a single feed is not an unreasonable request. We came up with the id approach to allow people to zero in on duplicates; that's the primary case. We did that without really articulating what an entry stands for, some effort was done post-hoc, but it doesn't seem to have made it as spec text. Consider. Is the XML entry in a feed a representation of an entry, a la REST? If so, does the id identify the representation or the entry? If the id identifies the entry representation, how (or should) we identify the entry? If the id identifies the entry, how (or need) we identify the representation?

Those are just some of the questions, we could ask. We could then ask the whole set over regarding what a feed is, as that has a bearing too.

I've said this before - the technical problem we have is that we can't distinguish between a buggy feed with the same ids and an aggregate feed with the same ids under the current spec. Because you can't have it both ways, that rationale should have been provided in the spec. It's an architectural constraint - you can not say this with your feed because we can't make sense of it - done, not to preserve some notion of identity we have here, but to allow people to drop duplicates and normalize their streams. The downside is that 1) some people do want to aggregate versions of an entry in a single feed, which presumably have the same id, so "they or others can say, these are all of the same thing", 2) some people do re-edit their entries or edit their entry dates around so the entry reappears with updated content. Bray does this with his Sunday server logs and no-one convinced me it's not equally as questionable an approach at some level as allow duplicate ids, or munging URLs the way ad people do - who cares as long as it gets into people's clients?

And it's clear now, that ids don't solve "the get rid of all those duplicates problem", dates are required also, to cater for cases where someone updates the entry, but we don't want people to miss that because of over-eager reaping on the clients.

It's a mess, that's our fault. As a first step we need to be able to say what's being identified. If we decide we don't care about that, we just want to illegalise duplicates, then maybe ids were not the right idea to begin with.

cheers
Bill

Re: PaceAllowDuplicateIDs

Reply via email to