I see another poster have written this, and deleted it afterwards. `This is almost certainly not Google as they obey robots.txt. The & to & conversion is another sign of a poor quality crawler. Check the RDNS and you will find it's probably some IP faking Google UA, I suggest blocking at network level.`
My actual reply: 1 - It is Google 2 - They do not always a user friendly user agent. That is a fact. 3 - When they don't, they also don't follow robots.txt. So my problem remains. I don't want to block those IP ranges at iptables level because it's Google. So a rewrite or redirect - I'm not sure exactly which ATM is badly needed. Depends on the URL. Here are the IP ranges, definetely Google. Referenced in https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/issues/175 And here is a copy of my original message. "Hi, I'm still faithful to your script. It does great things to my websites. Thanks for that. Not a bug properly speaking, just a constatation you might like, Recently, 1-2 months in time, I got a lot of strange impossible requests all with the same User-Agent, no referrer and HTTP/1.1. All came from Google. They do not respect robots.txt and sniff everywhere they're not supposed to. I thought you should be make aware of it. I know you whitelist Google IPs, but after inspection from other users, you might want to revisit those. User-agent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36" Ranges: 66.249.64.0/19 72.14.199.0/24 Examples of request: 72.14.199.18 - - [27/May/2018:14:12:01 -0700] "GET /page.php?page%3Dabout_himeji_forklifts& HTTP/1.1" 301 178 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36" 72.14.199.4 - - [27/May/2018:14:12:24 -0700] "GET /page.php?page%3Dabout_himeji_forklifts& HTTP/1.1" 302 165 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36" In the meantime, I circumvented your whitelist by issuing manual range bans. After 6 weeks, no more of those strange requests, and bandwidth has dropped significantly since those 2 ranges were requestings quite a few hundred of megabytes each day! Thanks again." Posted at Nginx Forum: https://forum.nginx.org/read.php?2,280093,280117#msg-280117 _______________________________________________ nginx mailing list [email protected] http://mailman.nginx.org/mailman/listinfo/nginx
