Re: feedsets and atom:source

Bob Wyman Tue, 17 Feb 2009 15:59:37 -0800

On Mon, Feb 16, 2009 at 1:35 PM, Erik Wilde <[email protected]> wrote:
> how to create a query language for feed


At least three aspects of Atom serve to facilitate precisely this
application (if I understand you correctly...).
1. Aggregate feeds are facilitated by having the atom.source element that
allows feed metadata to be copied into the entry. If an entry contains an
atom.source, then that entry, when read in an aggregated feed, can have the
same semantics as if it had been read from its source feed.
2. Atom prohibits assigning any semantics to an entry's order within a feed.
Thus, an atom.entry's semantics depend *only* on what is in the entry and
the feed's metadata if the entry has no atom.source.
3. Atom entries are "top-level" objects (unlike RSS items) and thus can be
read as self contained, complete documents within any feed or even outside a
feed.

Given these three aspects of Atom, it is quite possible to build a robust
feed query language that allows statements like: "FROM feeda, feedb, feedn
SELECT all entries that contain 'foobar'".

It seems, however, that the thing that is hanging you up here is that you
want to be able to encode a listing of the set of feeds which were scanned
in the process of building the result feed for a query. I think this is what
you're trying to accomplish with the link="via" stuff.

The problem, of course, is that this business of listing source feeds
doesn't scale -- thus, it isn't something that we should be trying to
standardize. For queries that only span a small number of feeds, it might
seem reasonable to explicitly list the source feeds, however, most "feed
search" engines these days work with many millions of feeds. Thus, an
aggregate result feed that contained only a handful of entries might need to
have a listing of 10's of millions of feeds -- whether or not the result
feed contained any of their entries.

It has been pointed out by others that you are misreading the defintion of
link="via". Hopefully, you won't try to overload it. However, you could
define a custom relationship, perhaps link="http://example.com/queried_feed";
for applications with small numbers of queried feeds. Alternatively, you
might create some custom XML file that served as a "query specfication" and
then link to that via some relationship... Of course, this relationship
should be associated with the feed and not with any of the entries within
the feed. Each entry in the feed would still state it's "origin" via the
atom.source element. (Note: The list of queried feeds really is an attribute
of the result feed, not the entries in it, thus, this shouldn't be a big
problem.)

Unfortunately, there is no way to encode an entry that shows both the
entry's source as well as the reason why it has been included in an
aggregate feed. Doing this would require an ability to "add data" to an
entry after it is originally published. But, Atom provides no mechanisms for
adding data to entries once they are published -- except for the business
about inserting atom.source if not present. During the process of defining
the atom format, we discussed at length a number of alternatives that would
have permitted adding more "history" to entries but we couldn't get any
agreement on what to do. The discussions typically included ideas for
describing the "provenance" of an entry...

Are you sure you *really* want to include a listing of the queried feeds?
What would you do if you had millions of feeds to query?

bob wyman

Re: feedsets and atom:source

Reply via email to