I've got another example of a selfish feed which is producing dynamic
content which will cause many duplicate entries to float around the
blogosphere. The feed in question here is an RSS feed. Nonetheless, I think
we must expect people will do the same stupid tricks with Atom feeds. Check
out:

http://www.b-eye-network.com/xml/articles.php

What you'll get is a feed with entries that look something like the one at
the bottom of this page. The interesting thing to note is that the item has
a <link> element with the url:

  <link>http://www.b-eye-network.com/view/index.php?
   cid=836&fc=0&frss=1&ua=Mozilla/4.0 (compatible; 
    MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; Alexa Toolbar)</link>

What's happened here is that the site has appended my User Agent to the URL
in the link. I assume that this is to allow some kind of tracking. However,
the impact is that the contents of the feed depend on what tool you use to
read the feed. If you access the feed, you will undoubtedly get different
content then I did... For instance, if PubSub's crawler had read the feed,
the value of the "ua" attribute in the URL would have been different and the
URL would have read:
   <link>http://www.b-eye-network.com/view/index.php?
    cid=836&amp;fc=0&amp;frss=1&amp;ua=PubSub.com RSS reader - 
    http://www.pubsub.com/</link>

If this feed is read by more than one synthetic feed generator or if items
from the feed are copied from this feed to another, it is inevitable that
we'll have multiple copies of the item floating around and we'll have very
little means for determining which one is authoritative -- essentially they
all are. It would be handy to have a "dynamic content flag" that allows us
to ignore this stuff...

This business of people including dynamic content in their feeds for selfish
purposes is making it very difficult to build a decent infrastructure for
distributing and caching RSS/Atom entries... We've got a "tragedy of the
commons" situation going on here. The much-too-respectable SEO crowd is
trying to seek profit at the expense of the network at large... Because they
can.

        bob wyman

====== Full example entry from the feed =============
<item>
  <title>Nanotechnology Basics Defined</title> 
 <description>
 <![CDATA[ ADVERTISEMENT - <a
href='http://www.b-eye-network.com/adserver-new/adclick.php?bannerid=199&amp
;zoneid=27&amp;source=&amp;dest=http%3A%2F%2Finfo7.net%2Fs%2Fhyperion%2F53w%
2F2wp' target='_blank'>Find out how Hyperion can help: An Intelligent
Approach to Business Intelligence</a><div id="beacon_199" style="position:
absolute; left: 0px; top: 0px; visibility: hidden;"><img
src='http://www.b-eye-network.com/adserver-new/adlog.php?bannerid=199&amp;cl
ientid=178&amp;zoneid=27&amp;source=&amp;block=0&amp;capping=0&amp;cb=818aa5
08d0a2ebeec0c37d74b10535d6' width='0' height='0' alt='' style='width: 0px;
height: 0px;'></div><br/><br/>Nanotechnology is the Next Big Thing.
  ]]> 
  </description>
 
<link>http://www.b-eye-network.com/view/index.php?cid=836&fc=0&frss=1&ua=Moz
illa/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; Alexa
Toolbar)</link> 
  <pubDate>Thu, 5 May 2005 00:00:00 MST</pubDate> 
</item>


Reply via email to