+1. The timestamp can still be useful when dealing with more complex
> sync use cases.

In particular, without the timestamp(s) it is hard to see how the
tombstones fit in with feed paging and date-range-based queries.


This concerned me as well. Without it, the end result is a server spitting out tombstones for every item that's ever been deleted on every feed request - since it has no knowledge of what the client is holding and what the client is interested in holding/syncing.

I was also concerned about this during the prev/next discussions but bit my tongue when folks suggested that a page they've seen before shouldn't ever change. I don't agree. I just edited an entry from 1968. It might be buried on either page=279 or weblog/AUG-1968, but either way; it changed and a client that really wants to keep in sync should be made aware that a change was made. I also recently removed some copyrighted material from my site, but couldn't remove said items from the perma cache of a certain news reader whose company motto is 'Do No Evil'. Looks like a DMCA take-down is the only way to get them to flush the nuked entries.

How does one sync this stuff? The only thing that makes sense is for the client to tell the server what the client is holding and what it is interested in syncing. Then one can look at modification times and gather a list of changes to push out. In the case of a file from 1968 that was modified, a collection of updated entries starting at a client specified time could be pushed.

In the case of deletions, there are a few possibilities. It would actually be a good idea in theory for the client to tell the server what ID's it already has. All of them. Then the server can respond saying that id 1234 was deleted. This is hardly efficient, but removes any need for the server to maintain state about deleted items. So I can concede that the quick and dirty solution is for the server to keep a tombstone list, even if I find it personally disturbing for the server to maintain this state. Given these constraints the only thing that makes sense in terms of normal 'feeds' is to emit these in the proper 'order' with whatever other entries are being emitted. But there's no historical starting point. No knowledge of what the client is holding. So you've gotta' send out all deletions. All of them. Evey time a client connects. Forever. At least if they're in item order, you can limit the deleted item list to the range of items currently being emitted. This might work in the classic sense of 'working'; at least there's some means of communicating deletions, but it doesn't solve the copyright take-down problem for items that fall outside of the current requested feed range.

The right way to do it is to emit these via a 'sync' operation where the client specifies a start date of interest, and a date of the last request it made for changes. Then you can figure out that the client's earliest cared about entry is in 1943, and you can send back the message from 1968 that was updated last week, as well as the deletion of a copyrighted message from 1982 that happened yesterday. When the client connects again next week, you'll have another set of changes that occurred since now. This is the most efficient way of doing it (ignoring the inefficiencies of protocol chatter).

So we're back to square one. To really do it right, it needs to be a client initiated operation because only the client knows what it has, what it cares about, and when it last checked. Anything else is a hack that will likely not survive.

mike

Reply via email to