Hello! On Sat, Sep 27, 2025 at 02:28:11PM -0400, Paul wrote:
> On 9/27/25 03:08, Maxim Dounin wrote: > > Hello! > > Maxim, many thanks. Currently battling a DDoS including out of control > "AI". Front end nginx/1.18.0 (Ubuntu) easily handles volume (CPU usage > rarely above 1%) but proxied apache2 often runs up to 98% across 12 cores > (complex cgi needs 20-40 ms per response.) > > I'm attempting to mitigate. Your advice appreciated. I've "snipped" below > for readability: > > [snip] > > > I am currently (a bit "hit and miss") using : > > > > > > proxy_buffering on; # maybe helps proxied apache2 ? > > > > Proxy buffering is on by default (see > > http://freenginx.org/r/proxy_buffering), so there is no need to > > switch it on unless you've switched it off at previous > > configuration levels. > > Understood, thanks -- I had two lines (rem'd in or out for testing purposes) > trying to respect genuine requests from regular users. Given that nginx has > a lot of spare capacity, could this be better tuned to alleviate the load on > the back end? I've read your doc, but in a production environment, I'm > unsure of the implications of "proxy_buffers number size;" and > "proxy_busy_buffers_size size;" In general, "proxy_buffering on" (the default) is to minimize usage of backend resources: it is designed to read the response from the backend as fast as possible into nginx buffers, so the backend connection can be released and/or closed even if the client is slow and sending the response to the client takes significant time. It is not that important nowadays, since clients are usually fast now, yet still can help in some cases. Unlikely in case of AI scrappers though. Other related settings, such as proxy_buffers, is to control what nginx does with buffers, and mostly needed to optimize processing on the nginx side. In particular, larger proxy_buffers might be needed if you want to keep more data in memory (vs. disk buffering). As long as responses are small enough to fit into existing memory buffers (4k proxy_buffer_size + 8 * 4k proxy_buffers == 36k by default), you probably don't need to tune anything. The proxy_busy_buffers_size directive controls how many memory buffers can be used to send the response to the client (vs. writing the response to the file-based buffer). It often needs to be explicitly configured to ensure it matches non-default proxy_buffers settings, but otherwise there isn't much need to tune it. > > > connection_pool_size 512; > > > client_header_buffer_size 512; > > > large_client_header_buffers 4 512; > > > > Similarly, I would rather use the default values unless you > > understand why you want to change these. > > Maybe mistakenly, I was trying to eliminate stupidly artificial cgi requests > -- "GET /cgi-bin/....." that ran several kilobytes long. The backend apache > could "swallow" them (normally a 404) but I was trying to eliminate the > overhead. If the goal is to stop requests with very long URIs, using an explicit regular expression to limit such URIs might be a better option. For example: if ($request_uri ~ ".{256}") { return 444; } The regular expression matches any request URI with more than 256 characters, and such requests are rejected . > > > location ~ \.php$ { return 444; } > > You did not mention this, but it does not appear to work well. access.log > today gives hundreds of: > > 104.46.211.169 - - [27/Sep/2025:12:32:12 +0000] "GET /zhidagen.php HTTP/1.1" > 404 5013 "-" "-" > > and the 5013 bytes is our "404-solr-try-again" page, not the 444 expected. This indicate there is something wrong with the configuration. Possible issues include: - Location being configured in the wrong/other server{} block. - Other locations with regular expressions interfere and take precedence. >From the details provided I suspect it's 404 from nginx, so might be simply a request from an unrelated server{} block handled by nginx? > > Also, depending on the traffic pattern you are seeing, it might be > > a good idea to configure limit_req / limit_conn with appropriate > > limits. > > Again thanks, I had tried various 'location' lines such as > limit_req_zone $binary_remote_addr zone=mylimit:5m rate=1r/s; > limit_req zone=mylimit burst=5 nodelay; > > without success... obviously haven't fully understood Depending on the traffic pattern, limiting per $binary_remote_addr might not be effective. In particular, AI scrappers I've observed tend to use lots of IP addresses, and limiting them based on sole IP address doesn't work well. For freenginx.org source code repositories I currently use something like this to limit abusive behaviour (yet still allow automated requests when needed, such as for non-abusive search engine indexing and repository cloning): map $binary_remote_addr $net24 { ~^(\C\C\C) $1; } map $binary_remote_addr $net16 { ~^(\C\C) $1; } map $binary_remote_addr $net8 { ~^(\C) $1; } limit_conn_zone $binary_remote_addr zone=conns:1m; limit_conn_zone $net24 zone=conns24:1m; limit_conn_zone $net16 zone=conns16:1m; limit_conn_zone $net8 zone=conns8:1m; Additionally, I use the following to limit most abusive AI scrappers with multiple netblocks, mostly filed with netblocks manually: geo $remote_addr $netname { # AS45102, Alibaba Cloud LLC 47.74.0.0/15 AS45102; 47.80.0.0/13 AS45102; 47.76.0.0/14 AS45102; # AS32934, Facebook, netblocks observed in logs 57.141.0.0/16 AS32934; 57.142.0.0/15 AS32934; 57.144.0.0/14 AS32934; 57.148.0.0/15 AS32934; # Huawei netblocks, from geofeed in whois records 1.178.32.0/23 HW; ... } limit_conn_zone $netname zone=connsname:1m; With the following limits in proxied locations: limit_conn conns 5; limit_conn conns24 10; limit_conn conns16 20; limit_conn conns8 30; limit_conn connsname 10; The backend is configured to serve 30 parallel requests and has listen queue 128 (Apache httpd with "MaxRequestWorkers 30"). With the above limits it currently works without issues, ensuring no errors and reasonable response time for all users. If the goal is to stop all automated scrapping, using some JS-based challenge as already recommended in this thread might be a better option. Hope this helps. -- Maxim Dounin http://mdounin.ru/
