On 2006/06/07, at 9:03 AM, James Holderness wrote:
As for machine->machine communication, if these feeds aren't meant
for desktop aggregators then does it really matter that they
function differently? You can describe one algorithm for use in
machine->machine communication and another for use by desktop
aggregators downloading "regular" feeds. Both can use the same link
relations because they should never come into contact with each
other. Having said that I still don't see how a machine->machine
algorithm for retrieving a paged feed can be different from your
current feed history algorithm and still be useful.
I don't see a clean split between machine-to-machine vs. desktop
aggregator cases; for example, an incremental-style feed can be
useful both on the desktop (to make sure I see all of your blog
entries) as well as with processes (to make sure that my program
doesn't miss a critical event if it has some downtime or loss of
connectivity).
Similarly, some of the cases I've heard for paging-style feeds are
with desktop clients (e.g., "get me the next results, please") and
some are with processes (e.g., processing search results automatically).
The difference has more to do with a) what guarantees the server
wants to provide, and b) what resources they're willing to devote
towards meeting those guarantees.
Lets say I was a search engine returning paged results. A search is
performed that returns 200 results. I return 20 pages, 10 results
per page. First time around a client supporting the feed history
algorithm would retrieve all 20 pages no problem. So far I see no
difference between how a desktop aggregator would behave and how
machine->machine communication would function.
The second time the client connects (assuming there is a second
time) it sends through an etag and/or last-modified date so the
search engine knows which results it already has. Say there are 3
new results since the previous retrieval. Either the search engine
is smart enough to just return those 3 results or it's going to
ignore the etag and return everything - 21 pages, 10 results per
page, new items could be anywhere.
As a desktop aggregator I guarantee you I'm not going to want to
download 20+ pages every hour just to find the 3 new items that
*might* be there. Fortunately the feed history algorithm would stop
me after the first page, and I'm thankful for that. Would a machine-
>machine communication be any different? Would they really want to
download every single one of those 203 results just to find the 3
new items?
These are pretty much the assumptions that I was making previously.
The degree of precision that FH currently provides isn't desirable
for search results. Feed History also requires that the server
maintain state about a particular feed, which is unworkable for
search results; e.g., to implement feed history for search results, a
server would have to mint a whole new set of feed documents for every
query, and keep them around. That's not workable for most search
engines (Yahoo, Google, Amazon, whatever), so they need another
option -- one that needs to be clearly distinct from FH.
This brings me to my other motivation -- I found that most people who
use "previous" and "next" don't understand the assumptions that FH
makes about archive stability, and point them at URIs like "http://
example.org/feed.atom?page=3". That will break the FH algorithm
badly, reducing the value of the mechanism as a whole, because people
will stop trusting it. The link relation for implementing the
incremental approach needs to have the stability semantics baked in
and explicit.
--
Mark Nottingham http://www.mnot.net/