Antone Roundy wrote:
> [polling] would result in bandwidth usage being spread over time
> better than push, assuming push servers push new entries out to
> everyone as quickly as possible after they're published.
At PubSub, we operate both a push and a pull service. We know from
experience that the push service results in drastically less bandwidth per
user then the polling service does. This is not based on theory -- rather
the statement is based on real empirical evidence of running both services
side by side. The bandwidth savings of the push service comes from the fact
that we don't have to handle the poll requests themselves and we never
publish the same item twice to a single subscriber as is inevitably the case
with a multi-entry polled RSS/Atom file.
In order to demonstrate polling at its most efficient, I defined and
convinced a number of folk to implement RFC3229+feed. This HTTP extension
eliminates the problem with sending multiple copies of messages to people
and the resulting drop in bandwidth requirements for those who poll and use
RFC3229+feed was dramatic[1] -- however, the RFC3229+feed service still
consumes more bandwidth then the push service. Once again, this ain't
theory, it is experience.
> I've lost track of what aspect of Atom's architecture you
> are saying is bound to polling at the expense of push.
There are several "Feed-oriented" or "polling oriented" discussions
going on:
1. The requirement that feeds not be permitted to contain multiple
entries that have the same atom:id. This requirement prevents push based
system from using feeds as "logs" or "histories" of messages sent. For
instance, at PubSub, the Atom file which is associated with every
subscription contains a stream of entries that are the entries that either
were or would have been sent to the client if it had been connected. The
first thing a client does on starting up is to read the Atom file in order
to synchronize state with the server. Thus, the feed is actually just a
trace of the messages sent. This probably sounds weird if you're
feed-focused; however, it makes perfect sense if you are pushing entries.
2. Significance of the order of entries in feeds. A push-fed client
only sees feeds in edge cases (like the PubSub synchronization usage
discussed above). Because a push-fed client receives a stream of entries --
not feeds, it can only see chronological order unless order is encoded
within the entry itself. Any requirement that document order is significant
(and some are still suggesting it is) means that a push-based system must
push entire feeds rather then individual entries. The bandwidth costs of
doing so would be completely unacceptable. (Things like ordered lists should
be handled by having entries that contain ordered lists. Exploiting the
anecdotal attributes of the atom:feed document and the practices of current
aggregators is NOT reasonable.)
3. The absence of a revision id or atom:modified. The feed-ists
don't see the need for revision numbers in part because they are focused on
feeds rather than entries. If you live in the world of entries -- as a push
system does -- then the need to distinguish multiple versions of the same
entry becomes much, much more important.
I have an additional set of concerns that are based on our
experience as generators of aggregate feeds. The problem is that we are at
the mercy of the creators of feeds... The rules say, as they should, that
we're not supposed to change the atom:id of something that we extract from a
feed. Thus, when we generate results, we should use the atom:id's that we
found in the source entries. However, if someone fails to create really
unique atom:id's, we are left wondering what to do... We aren't supposed to
generate a feed with repeat atom:id's according to the spec, however, our
users will want to see all entries that match their search terms. We have to
either conform to the Atom spec by dropping one of the Entries from the
results -- and thus serve our users less well, or, we have to generate new
atom:id's in order to serve our users but violate the specification. In
other words, we can't win.
Similarly, the prohibition against multiple instances of an atom:id
means that we can't implement a "show me all versions" feature using Atom as
the packaging format for the results. (No, we're not planning to do this.
But, it would be good to be able to do it.)
bob wyman
[1] http://bobwyman.pubsub.com/main/2004/10/massive_bandwid.html