On Mon, Feb 16, 2009 at 1:35 PM, Erik Wilde <[email protected]> wrote: > how to create a query language for feed
At least three aspects of Atom serve to facilitate precisely this application (if I understand you correctly...). 1. Aggregate feeds are facilitated by having the atom.source element that allows feed metadata to be copied into the entry. If an entry contains an atom.source, then that entry, when read in an aggregated feed, can have the same semantics as if it had been read from its source feed. 2. Atom prohibits assigning any semantics to an entry's order within a feed. Thus, an atom.entry's semantics depend *only* on what is in the entry and the feed's metadata if the entry has no atom.source. 3. Atom entries are "top-level" objects (unlike RSS items) and thus can be read as self contained, complete documents within any feed or even outside a feed. Given these three aspects of Atom, it is quite possible to build a robust feed query language that allows statements like: "FROM feeda, feedb, feedn SELECT all entries that contain 'foobar'". It seems, however, that the thing that is hanging you up here is that you want to be able to encode a listing of the set of feeds which were scanned in the process of building the result feed for a query. I think this is what you're trying to accomplish with the link="via" stuff. The problem, of course, is that this business of listing source feeds doesn't scale -- thus, it isn't something that we should be trying to standardize. For queries that only span a small number of feeds, it might seem reasonable to explicitly list the source feeds, however, most "feed search" engines these days work with many millions of feeds. Thus, an aggregate result feed that contained only a handful of entries might need to have a listing of 10's of millions of feeds -- whether or not the result feed contained any of their entries. It has been pointed out by others that you are misreading the defintion of link="via". Hopefully, you won't try to overload it. However, you could define a custom relationship, perhaps link="http://example.com/queried_feed" for applications with small numbers of queried feeds. Alternatively, you might create some custom XML file that served as a "query specfication" and then link to that via some relationship... Of course, this relationship should be associated with the feed and not with any of the entries within the feed. Each entry in the feed would still state it's "origin" via the atom.source element. (Note: The list of queried feeds really is an attribute of the result feed, not the entries in it, thus, this shouldn't be a big problem.) Unfortunately, there is no way to encode an entry that shows both the entry's source as well as the reason why it has been included in an aggregate feed. Doing this would require an ability to "add data" to an entry after it is originally published. But, Atom provides no mechanisms for adding data to entries once they are published -- except for the business about inserting atom.source if not present. During the process of defining the atom format, we discussed at length a number of alternatives that would have permitted adding more "history" to entries but we couldn't get any agreement on what to do. The discussions typically included ideas for describing the "provenance" of an entry... Are you sure you *really* want to include a listing of the queried feeds? What would you do if you had millions of feeds to query? bob wyman
