Hello, your Nutch Crawler has a bug. It tries to read new links from Javascript parts of websites - unfortunately the things its trying to detect are none. An example would be http://life.winter.cd/stories/838/ . In the Javascript that displays the backlinks is a (manually filled) blacklist of domains to filter Referrerspammers - and your script tries to use those "links" that are never intended to be links. That in turn results in tons and tons of 404s in my logfiles looking like:
GET /stories/306/^http://www\\.vjuror\\.com GET /stories/306/^http://www\\.thexmlguys\\.com GET /stories/306/^http://www\\.sudtuiles\\.com and so on.. I have now blocked your crawler as suggested here: http://lucene.apache.org/nutch/bot.html but I thought you also might want to hear about your crawlers annoying bugs.. Bye.