On 2006/06/07, at 9:03 AM, James Holderness wrote:

As for machine->machine communication, if these feeds aren't meant for desktop aggregators then does it really matter that they function differently? You can describe one algorithm for use in machine->machine communication and another for use by desktop aggregators downloading "regular" feeds. Both can use the same link relations because they should never come into contact with each other. Having said that I still don't see how a machine->machine algorithm for retrieving a paged feed can be different from your current feed history algorithm and still be useful.

I don't see a clean split between machine-to-machine vs. desktop aggregator cases; for example, an incremental-style feed can be useful both on the desktop (to make sure I see all of your blog entries) as well as with processes (to make sure that my program doesn't miss a critical event if it has some downtime or loss of connectivity).

Similarly, some of the cases I've heard for paging-style feeds are with desktop clients (e.g., "get me the next results, please") and some are with processes (e.g., processing search results automatically).

The difference has more to do with a) what guarantees the server wants to provide, and b) what resources they're willing to devote towards meeting those guarantees.

Lets say I was a search engine returning paged results. A search is performed that returns 200 results. I return 20 pages, 10 results per page. First time around a client supporting the feed history algorithm would retrieve all 20 pages no problem. So far I see no difference between how a desktop aggregator would behave and how machine->machine communication would function.

The second time the client connects (assuming there is a second time) it sends through an etag and/or last-modified date so the search engine knows which results it already has. Say there are 3 new results since the previous retrieval. Either the search engine is smart enough to just return those 3 results or it's going to ignore the etag and return everything - 21 pages, 10 results per page, new items could be anywhere.

As a desktop aggregator I guarantee you I'm not going to want to download 20+ pages every hour just to find the 3 new items that *might* be there. Fortunately the feed history algorithm would stop me after the first page, and I'm thankful for that. Would a machine- >machine communication be any different? Would they really want to download every single one of those 203 results just to find the 3 new items?

These are pretty much the assumptions that I was making previously. The degree of precision that FH currently provides isn't desirable for search results. Feed History also requires that the server maintain state about a particular feed, which is unworkable for search results; e.g., to implement feed history for search results, a server would have to mint a whole new set of feed documents for every query, and keep them around. That's not workable for most search engines (Yahoo, Google, Amazon, whatever), so they need another option -- one that needs to be clearly distinct from FH.

This brings me to my other motivation -- I found that most people who use "previous" and "next" don't understand the assumptions that FH makes about archive stability, and point them at URIs like "http:// example.org/feed.atom?page=3". That will break the FH algorithm badly, reducing the value of the mechanism as a whole, because people will stop trusting it. The link relation for implementing the incremental approach needs to have the stability semantics baked in and explicit.

--
Mark Nottingham     http://www.mnot.net/

Reply via email to