Reinhard, Björn, Am 08.09.20 um 21:39 schrieb Björn Jacke: >> the only official supported way to identify a google bot is to run a >> reverse DNS lookup on the accessing IP address and run a forward DNS >> lookup on the result to verify that it points to accessing IP address >> and the resulting domain name is in either googlebot.com or google.com >> domain. >> ... > > thanks for asking this again, I brought this up earlier this year and I > got no answer: > > https://www.mail-archive.com/[email protected]/msg37301.html > > I would expect that this is something that most sites would actually > want to check and I'm surprised that there is no solution for this or at > least none that is obvious to find.
The usually recommended solution for this kind of checks is either Lua or the SPOA, running the actual logic out of process. For Lua my haproxy-auth-request script is a batteries included solution to query an arbitrary HTTP service: https://github.com/TimWolla/haproxy-auth-request. It comes with the drawback that Lua runs single-threaded within HAProxy, so you might not want to use this if the checks need to run in the hot path, handling thousands of requests per second. It should be possible to cache the results of the script using a stick table or a map. Back in nginx times I used nginx' auth_request to query a local service that checked whether the client IP address was a Tor exit node. It worked well. For SPOA there's this random IP reputation service within the HAProxy repository: https://github.com/haproxy/haproxy/tree/master/contrib/spoa_example. I never used the SPOA feature, so I can't comment on whether that example generally works and how hard it would be to extend it. It certainly comes with the restriction that you are limited to C or Python (or a manual implementation of the SPOA protocol) vs anything that speaks HTTP. Best regards Tim Düsterhus

