Hello,

your Nutch Crawler has a bug. It tries to read new links from
Javascript parts of websites - unfortunately the things its trying to
detect are none. An example would be
http://life.winter.cd/stories/838/ . In the Javascript that displays
the backlinks is a (manually filled) blacklist of domains to filter
Referrerspammers - and your script tries to use those "links" that are
never intended to be links. That in turn results in tons and tons of
404s in my logfiles looking like:

GET /stories/306/^http://www\\.vjuror\\.com
GET /stories/306/^http://www\\.thexmlguys\\.com
GET /stories/306/^http://www\\.sudtuiles\\.com

and so on.. I have now blocked your crawler as suggested here:
http://lucene.apache.org/nutch/bot.html
but I thought you also might want to hear about your crawlers annoying bugs..

Bye.

Reply via email to