Hi Maximilian,

What Iwere missing is the robots.txt itself. I.e how are you trying to ban
Nutch. I've been in touch with the guys at traffic server with your issue
to to see if they have suggestions without totally banning all Nutch
instances from contacting your webserver.

To all dev's, the other thing that strikes me as odd is the User-Agent
string. Is this really how Nutch identifies itself?

Thanks

Lewis

2011/11/16 Maximilian Laurenz <[email protected]>

>  All requests seem to come from a German company called
> http://www.pixray.com, which obviously ignores the robots.txt with their
> version of the Nutch crawler. We informed them and will ban their IP-range,
> if they don’t stop to scan us with invalid requests.****
>
> ** **
>
> Sincerely,****
>
> *Maximilian Laurenz*
> *S&L Medien Gruppe GmbH*
> Aidenbachstraße 54
> 81379 München
> Tel. +49 89 790862-49
> Fax +49 89 790862-55
> [email protected]
> http://www.slmedien.de ****
>
> S&L Medien Gruppe GmbH | Geschäftsführung: Maria-Theresia von Seidlein,
> Torsten Weihrich, Olaf Wiehler | Sitz der Gesellschaft: München |
> Amtsgericht München | HRB 99977 ****
>
> ** **
>
> ** **
>
> *Von:* Maximilian Laurenz
> *Gesendet:* Mittwoch, 2. November 2011 14:14
> *An:* '[email protected]'
> *Betreff:* Nutch ignores robots.txt****
>
> ** **
>
> Hi there,****
>
> Because a Nutch client seems to cause errors on our web server, we changed
> robots.txt for www.kinoundco.de to disallow Nutch. Unfortunately we still
> get requests:****
>
> ** **
>
> 2011-11-01 05:50:35 W3SVC4 MB 62.128.28.16 GET
> /Rango/+(Math.random()*100000)+ - 80 - 188.40.65.130 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 373 263 0****
>
> 2011-11-01 05:52:23 W3SVC4 MB 62.128.28.16 GET /default.aspx
> aspxerrorpath=/Atemlos-Gefaehrliche-Wahrheit/+(Math.random()*100000)+ 80 -
> 188.40.65.130 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 200 0 32449 315 15****
>
> 2011-11-01 05:59:15 W3SVC4 MB 62.128.28.16 GET
> /Kleine-wahre-Luegen/+(Math.random()*100000)+ - 80 - 188.40.65.130 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 401 277 15****
>
> 2011-11-01 05:59:31 W3SVC4 MB 62.128.28.16 GET
> /Nachtasyl/+(Math.random()*100000)+ - 80 - 188.40.65.130 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 381 267 0****
>
> 2011-11-01 06:35:30 W3SVC4 MB 62.128.28.16 GET
> /Auf-der-anderen-Seite-der-Leinwand-100-Jahre-Moviemento/+(Math.random()*100000)+
> - 80 - 78.46.90.27 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 473 313 15****
>
> 2011-11-01 06:35:33 W3SVC4 MB 62.128.28.16 GET
> /Zoowaerter/+(Math.random()*100000)+ - 80 - 78.46.90.27 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 383 268 31****
>
> 2011-11-01 06:35:40 W3SVC4 MB 62.128.28.16 GET /default.aspx
> aspxerrorpath=/Sascha/+(Math.random()*100000)+ 80 - 78.46.90.27 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 200 0 32449 292 62****
>
> 2011-11-01 06:36:35 W3SVC4 MB 62.128.28.16 GET
> /Beschissenheit-der-Dinge/+(Math.random()*100000)+ - 80 - 78.46.90.27HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 411 282 31****
>
> 2011-11-01 06:38:14 W3SVC4 MB 62.128.28.16 GET
> /Auf-der-Suche/+(Math.random()*100000)+ - 80 - 78.46.90.27 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 389 271 15****
>
> 2011-11-01 06:39:55 W3SVC4 MB 62.128.28.16 GET
> /Fall/+(Math.random()*100000)+ - 80 - 78.46.90.27 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 371 262 15****
>
> 2011-11-01 07:51:10 W3SVC4 MB 62.128.28.16 GET
> /Midnight-in-Paris/+(Math.random()*100000)+ - 80 - 176.9.26.236 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 397 275 0****
>
> 2011-11-01 07:51:40 W3SVC4 MB 62.128.28.16 GET
> /Betty-Anne-Waters/+(Math.random()*100000)+ - 80 - 176.9.26.236 HTTP/1.0
> Mozilla/5.0+(Windows+NT+6.1;+WOW64;+rv:2.0.1)+Gecko/20100101++++Firefox/4.0.1+++/Nutch-1.2
> - - www.kinoundco.de 302 0 397 275 15****
>
> ** **
>
> Sincerely,****
>
> Max****
>
> *Maximilian Laurenz*
> *S&L Medien Gruppe GmbH*
> Aidenbachstraße 54
> 81379 München
> Tel. +49 89 790862-49
> Fax +49 89 790862-55
> [email protected]
> http://www.slmedien.de ****
>
> S&L Medien Gruppe GmbH | Geschäftsführung: Maria-Theresia von Seidlein,
> Torsten Weihrich, Olaf Wiehler | Sitz der Gesellschaft: München |
> Amtsgericht München | HRB 99977 ****
>
> ** **
>



-- 
*Lewis*

Reply via email to