Re: Paging, Feed History, etc.

Mark Nottingham Wed, 07 Jun 2006 11:04:44 -0700


On 2006/06/07, at 9:03 AM, James Holderness wrote:

As for machine->machine communication, if these feeds aren't meantfor desktop aggregators then does it really matter that theyfunction differently? You can describe one algorithm for use inmachine->machine communication and another for use by desktopaggregators downloading "regular" feeds. Both can use the same linkrelations because they should never come into contact with eachother. Having said that I still don't see how a machine->machinealgorithm for retrieving a paged feed can be different from yourcurrent feed history algorithm and still be useful.

I don't see a clean split between machine-to-machine vs. desktopaggregator cases; for example, an incremental-style feed can beuseful both on the desktop (to make sure I see all of your blogentries) as well as with processes (to make sure that my programdoesn't miss a critical event if it has some downtime or loss ofconnectivity).

Similarly, some of the cases I've heard for paging-style feeds arewith desktop clients (e.g., "get me the next results, please") andsome are with processes (e.g., processing search results automatically).

The difference has more to do with a) what guarantees the serverwants to provide, and b) what resources they're willing to devotetowards meeting those guarantees.

Lets say I was a search engine returning paged results. A search isperformed that returns 200 results. I return 20 pages, 10 resultsper page. First time around a client supporting the feed historyalgorithm would retrieve all 20 pages no problem. So far I see nodifference between how a desktop aggregator would behave and howmachine->machine communication would function.
The second time the client connects (assuming there is a secondtime) it sends through an etag and/or last-modified date so thesearch engine knows which results it already has. Say there are 3new results since the previous retrieval. Either the search engineis smart enough to just return those 3 results or it's going toignore the etag and return everything - 21 pages, 10 results perpage, new items could be anywhere.
As a desktop aggregator I guarantee you I'm not going to want todownload 20+ pages every hour just to find the 3 new items that*might* be there. Fortunately the feed history algorithm would stopme after the first page, and I'm thankful for that. Would a machine->machine communication be any different? Would they really want todownload every single one of those 203 results just to find the 3new items?

These are pretty much the assumptions that I was making previously.The degree of precision that FH currently provides isn't desirablefor search results. Feed History also requires that the servermaintain state about a particular feed, which is unworkable forsearch results; e.g., to implement feed history for search results, aserver would have to mint a whole new set of feed documents for everyquery, and keep them around. That's not workable for most searchengines (Yahoo, Google, Amazon, whatever), so they need anotheroption -- one that needs to be clearly distinct from FH.

This brings me to my other motivation -- I found that most people whouse "previous" and "next" don't understand the assumptions that FHmakes about archive stability, and point them at URIs like "http://example.org/feed.atom?page=3". That will break the FH algorithmbadly, reducing the value of the mechanism as a whole, because peoplewill stop trusting it. The link relation for implementing theincremental approach needs to have the stability semantics baked inand explicit.


--
Mark Nottingham     http://www.mnot.net/

Re: Paging, Feed History, etc.

Reply via email to