Tim, Aleksandar, On 9/8/20 11:18 PM, Aleksandar Lazic wrote: > On 08.09.20 22:54, Tim Düsterhus wrote: >> Reinhard, >> Björn, >> >> Am 08.09.20 um 21:39 schrieb Björn Jacke: >>>> the only official supported way to identify a google bot is to run a >>>> reverse DNS lookup on the accessing IP address and run a forward DNS >>>> lookup on the result to verify that it points to accessing IP address >>>> and the resulting domain name is in either googlebot.com or google.com >>>> domain. >>>> ... >>> >>> thanks for asking this again, I brought this up earlier this year and I >>> got no answer: >>> >>> https://www.mail-archive.com/haproxy@formilux.org/msg37301.html >>> >>> I would expect that this is something that most sites would actually >>> want to check and I'm surprised that there is no solution for this >>> or at >>> least none that is obvious to find. >> >> The usually recommended solution for this kind of checks is either Lua >> or the SPOA, running the actual logic out of process. >> >> For Lua my haproxy-auth-request script is a batteries included solution >> to query an arbitrary HTTP service: >> https://github.com/TimWolla/haproxy-auth-request. It comes with the >> drawback that Lua runs single-threaded within HAProxy, so you might not >> want to use this if the checks need to run in the hot path, handling >> thousands of requests per second. >> >> It should be possible to cache the results of the script using a stick >> table or a map. >> >> Back in nginx times I used nginx' auth_request to query a local service >> that checked whether the client IP address was a Tor exit node. It >> worked well. >> >> For SPOA there's this random IP reputation service within the HAProxy >> repository: >> https://github.com/haproxy/haproxy/tree/master/contrib/spoa_example. I >> never used the SPOA feature, so I can't comment on whether that example >> generally works and how hard it would be to extend it. It certainly >> comes with the restriction that you are limited to C or Python (or a >> manual implementation of the SPOA protocol) vs anything that speaks >> HTTP. > > In addition to Tim's answer you can also try to use spoa_server which > supports `-n <workers>`. > https://github.com/haproxy/haproxy/tree/master/contrib/spoa_server > thanks, for your reply and the information. Sorry for my late reply, but I had only today time to test. I did try to get the spoa server working on a ubuntu bionic (18.04.4) with haproxy 2.2.3-2ppa1~bionic from the vbernat ppa. I could compile the spoa server with python 3.6 support from the latest github sources without obvious problems and it also started without problems with the example python script (./spoa -d -f ps_python.py).
If I start haproxy with the following command: haproxy -f spoa-server.conf -d haproxy seg faults on the first request to port 10001 If I start haproxy with the additional parameter -Ws then it does not seg fault, but only the first and every 4th request get (correctly?) forwarded to the spoa server, the 3 requests in between get answered with an empty %[var(sess.iprep.ip_score)]. Here are the log files of a working request: from haproxy: 00000000:test.accept(0008)=0014 from [127.0.0.1:57570] ALPN=<none> 00000000:test.clireq[0014:ffffffff]: GET / HTTP/1.1 00000000:test.clihdr[0014:ffffffff]: host: localhost:10001 00000000:test.clihdr[0014:ffffffff]: user-agent: curl/7.58.0 00000000:test.clihdr[0014:ffffffff]: accept: */* 00000000:test.clicls[0014:ffffffff] 00000000:test.closed[0014:ffffffff] from spoa server: 1599906552.714422 [01] New connection from HAProxy accepted 1599906552.714593 [01] Hello handshake done: version=2.0 - max-frame-size=16380 - healthcheck=false 1599906552.714780 [01] Notify frame received: stream-id=0 - frame-id=1 1599906552.714800 [01] Message 'check-client-ip' received [{'name': '', 'value': True}, {'name': '', 'value': 1234}, {'name': '', 'value': IPv4Address('127.0.0.1')}, {'name': '', 'value': IPv6Address('::55')}, {'name': '', 'value': 'localhost:10001'}] 1599906552.716741 [01] Ack frame sent: stream-id=0 - frame-id=1 And here from a not working request: from haproxy: 0000001f:test.accept(0008)=0015 from [127.0.0.1:57634] ALPN=<none> 0000001f:test.clireq[0015:ffffffff]: GET / HTTP/1.1 0000001f:test.clihdr[0015:ffffffff]: host: localhost:10001 0000001f:test.clihdr[0015:ffffffff]: user-agent: curl/7.58.0 0000001f:test.clihdr[0015:ffffffff]: accept: */* 0000001f:test.clicls[0015:ffffffff] 0000001f:test.closed[0015:ffffffff] 00000020:spoe-server.srvcls[ffffffff:adfd] 00000020:spoe-server.clicls[ffffffff:adfd] 00000020:spoe-server.closed[ffffffff:adfd] the spoa server does not log anything, during the request, but after a while the following lines are logged: 1599906689.387816 [01] New connection from HAProxy accepted 1599906689.387848 [01] Failed to write Agent frame 1599906689.387853 [01] Close the client socket because of I/O errors Every requests works if between the requests are at least 30 seconds, because after 30 seconds the spoa server logs that it closes the connection: 1599907270.605946 [01] Disconnect frame received: reason=normal 1599907270.606078 [01] Disconnect frame sent: reason=normal But also the following works for the first and last curl request and this takes a lot less then 30 seconds: curl -i localhost:10001; curl -i localhost:10001; curl -i localhost:10001; curl -i localhost:10001; curl -i localhost:10001 I am unsure if I am making some stupid mistakes, or if I should test it with an older haproxy version or how to debug the issue further. So any pointers are very much appreciated. >> Best regards >> Tim Düsterhus > > Regards > Aleks > Regards Reinhard