On 06/03/2026 22:28, Dave Polaschek wrote: > Identifying the crawlers is (almost) as simple as three ifs in a trenchcoat: > https://chronicles.mad-scientist.club/tales/surviving-the-crawlers/#three-ifs-in-a-trenchcoat
only 1 of those 3 ifs would be applicable here, but that's better that none > As for filtering by ip address, I found on my test website that blocking AWS, > Azure, Digital Ocean, and a few other “cloud providers” stopped most of the > AI crawlers, but the ones that got through after I blocked those were using > an automated chrome running from residential IP addresses, and were even more > aggressive about crawling than the ones coming from the cloud providers. > These are nasty, because they are real browsers from real residential > addresses, using some sort of browser extension to do the crawling > unbeknownst to the human user. no, don't block IPs. score requests. what rspamd does for mail, you should do for http requests, more or less. neutral request: 0 points bad protocol compliance? +1 bad IP? +1 bad user agent? +1 so you get a normal browsers, 0 points, vs suspected bot, 1 point and that's that. blocking requests by IPs end up in cutting off legit VPN traffic... and now since we need VPNs to access our pornhubs and spankbangs here in Europe, their usage will only intensify

