I've got another example of a selfish feed which is producing dynamic content which will cause many duplicate entries to float around the blogosphere. The feed in question here is an RSS feed. Nonetheless, I think we must expect people will do the same stupid tricks with Atom feeds. Check out:
http://www.b-eye-network.com/xml/articles.php What you'll get is a feed with entries that look something like the one at the bottom of this page. The interesting thing to note is that the item has a <link> element with the url: <link>http://www.b-eye-network.com/view/index.php? cid=836&fc=0&frss=1&ua=Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; Alexa Toolbar)</link> What's happened here is that the site has appended my User Agent to the URL in the link. I assume that this is to allow some kind of tracking. However, the impact is that the contents of the feed depend on what tool you use to read the feed. If you access the feed, you will undoubtedly get different content then I did... For instance, if PubSub's crawler had read the feed, the value of the "ua" attribute in the URL would have been different and the URL would have read: <link>http://www.b-eye-network.com/view/index.php? cid=836&fc=0&frss=1&ua=PubSub.com RSS reader - http://www.pubsub.com/</link> If this feed is read by more than one synthetic feed generator or if items from the feed are copied from this feed to another, it is inevitable that we'll have multiple copies of the item floating around and we'll have very little means for determining which one is authoritative -- essentially they all are. It would be handy to have a "dynamic content flag" that allows us to ignore this stuff... This business of people including dynamic content in their feeds for selfish purposes is making it very difficult to build a decent infrastructure for distributing and caching RSS/Atom entries... We've got a "tragedy of the commons" situation going on here. The much-too-respectable SEO crowd is trying to seek profit at the expense of the network at large... Because they can. bob wyman ====== Full example entry from the feed ============= <item> <title>Nanotechnology Basics Defined</title> <description> <![CDATA[ ADVERTISEMENT - <a href='http://www.b-eye-network.com/adserver-new/adclick.php?bannerid=199& ;zoneid=27&source=&dest=http%3A%2F%2Finfo7.net%2Fs%2Fhyperion%2F53w% 2F2wp' target='_blank'>Find out how Hyperion can help: An Intelligent Approach to Business Intelligence</a><div id="beacon_199" style="position: absolute; left: 0px; top: 0px; visibility: hidden;"><img src='http://www.b-eye-network.com/adserver-new/adlog.php?bannerid=199&cl ientid=178&zoneid=27&source=&block=0&capping=0&cb=818aa5 08d0a2ebeec0c37d74b10535d6' width='0' height='0' alt='' style='width: 0px; height: 0px;'></div><br/><br/>Nanotechnology is the Next Big Thing. ]]> </description> <link>http://www.b-eye-network.com/view/index.php?cid=836&fc=0&frss=1&ua=Moz illa/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; Alexa Toolbar)</link> <pubDate>Thu, 5 May 2005 00:00:00 MST</pubDate> </item>
