cairo.ee.ucla.edu - - [19/Jan/2006:13:10:26 -0800] "GET /archives/best/ index.html HTTP/1.0" 200 5096 "-" "NutchCVS/0.8-dev (Nutch; http:// lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" cairo.ee.ucla.edu - - [19/Jan/2006:13:10:27 -0800] "GET /ftpfiles.html HTTP/1.0" 200 5353 "-" "NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/ nutch/bot.html; nutch-agent@lucene.apache.org)" cairo.ee.ucla.edu - - [19/Jan/2006:13:10:27 -0800] "GET /faqs/medi-cont.html HTTP/1.0" 200 25734 "-" "NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/ nutch/bot.html; nutch-agent@lucene.apache.org)" cairo.ee.ucla.edu - - [19/Jan/2006:13:10:27 -0800] "GET /eclectic/felter/ index.html HTTP/1.0" 200 37998 "-" "NutchCVS/0.8-dev (Nutch; http:// lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" cairo.ee.ucla.edu - - [19/Jan/2006:13:10:27 -0800] "GET /eclectic/kings/ index.html HTTP/1.0" 200 61955 "-" "NutchCVS/0.8-dev (Nutch; http:// lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" cairo.ee.ucla.edu - - [19/Jan/2006:13:10:28 -0800] "GET /index.html HTTP/1.0" 200 4542 "-" "NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/nutch/ bot.html; nutch-agent@lucene.apache.org)" cairo.ee.ucla.edu - - [19/Jan/2006:13:10:28 -0800] "GET /gbx.php HTTP/1.0" 200 48551 "-" "NutchCVS/0.8-dev (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)" cairo.ee.ucla.edu - - [19/Jan/2006:13:10:29 -0800] "GET /eclectic/ellingwood/ index.html HTTP/1.0" 200 30049 "-" "NutchCVS/0.8-dev (Nutch; http:// lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)"
That gbx.php is my guestbook, which I've blocked in robots.txt. http://www.henriettesherbal.com/robots.txt They hit a bot trap later on and got blocked, but nutch only picked up 3 files after it got the first 403. Thanks, Henriette -- Henriette Kress, AHG Helsinki, Finland Henriette's herbal homepage: http://www.henriettesherbal.com