What exactly is the issue here? Lewis
On Thu, Oct 25, 2012 at 4:59 PM, Alex diNorcia <[email protected]> wrote: > http://alex.dinorcia.net/robots.txt has been in place and unchanged since > Aug 24 2004 > > * i'd also point out that it's crawling poorly to boot. the original link it > got into the directory with was > http://alex.dinorcia.net/stuff_i_got_in_emails/?C=M;O=D > it appears to add the descending order part of the get variables to each > file and gets a 404 error. > > here are some of the 14516 log entries that are not obeying the rules : > 119.139.27.64 - - [25/Oct/2012:04:22:08 -0400] "GET > /stuff_i_got_in_emails/Japanese%20Engrish%204.jpg;O=D HTTP/1.0" 404 246 "-" > "HD nutch agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:05:20:50 -0400] "GET > /stuff_i_got_in_emails/LeafBlower.jpg;O=D HTTP/1.0" 404 238 "-" "HD nutch > agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:06:26:43 -0400] "GET > /stuff_i_got_in_emails/snowmen3.gif;O=D HTTP/1.0" 404 236 "-" "HD nutch > agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:07:01:49 -0400] "GET > /stuff_i_got_in_emails/Everything.About.The.Doctor.jpg;O=D HTTP/1.0" 404 255 > "-" "HD nutch agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:08:12:06 -0400] "GET > /stuff_i_got_in_emails/fucked.jpg;O=D HTTP/1.0" 404 234 "-" "HD nutch > agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:08:18:54 -0400] "GET > /stuff_i_got_in_emails/H28.gif;O=D HTTP/1.0" 404 231 "-" "HD nutch > agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:08:26:50 -0400] "GET > /stuff_i_got_in_emails/Oprahs-Bees.gif;O=D HTTP/1.0" 404 239 "-" "HD nutch > agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:08:50:31 -0400] "GET > /stuff_i_got_in_emails/Reindeer_Mural.jpg;O=D HTTP/1.0" 404 242 "-" "HD > nutch agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:09:02:52 -0400] "GET > /stuff_i_got_in_emails/snowmen4.gif;O=D HTTP/1.0" 404 236 "-" "HD nutch > agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:09:04:52 -0400] "GET > /stuff_i_got_in_emails/ATT00173.jpg;O=D HTTP/1.0" 404 236 "-" "HD nutch > agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:09:22:19 -0400] "GET > /stuff_i_got_in_emails/?C=S;O=A HTTP/1.0" 200 159957 "-" "HD nutch > agent/Nutch-1.1 (Think)" > 119.139.27.64 - - [25/Oct/2012:10:55:09 -0400] "GET > /stuff_i_got_in_emails/outofthecloset%20(5).jpg;O=D HTTP/1.0" 404 246 "-" > "HD nutch agent/Nutch-1.1 (Think)" > > > -- Lewis

