Hi!

We've used nginx on our FreeBSD systems for what feels like forever, and love it. Over the last few years we've been hit by pretty massive DDoS attacks, and have been employing various tricks in nginx to fend them off. One of them is, of course, rate limiting.

Given a config like..
  limit_req_zone $request zone=unique_request_5:100m rate=5r/s;

and then
    limit_req zone=unique_request_5 burst=50 nodelay;

we're getting messages like this:
  could not allocate node in limit_req zone "unique_request_5"

We see this on an idle node that only get very sporadic requests. However, this is preceded by a DDoS attack several hours earlier, which consisted of requests hitting this exact location block with short requests like
  POST /foo/bar?token=DEADBEEF

When, after a few million requests like this in a short timespan, a "normal" request comes in - *much* longer than the DDoS request - , e.g.
  POST /foo/bar?token=DEADBEEF&moredata=foo&evenmoredata=bar

this is immediately REJECTED by the rate limiter, and we get the aforementioned error in the log.

The current theory, supported by consulting with FreeBSD developers far more educated and experienced than myself, is that something is going wrong with the LRU allocator: Since nearly all of the shared memory zone was filled with short requests, freeing up one (or even two) of them will not be sufficient for these new requests. Only an nginx restart clears this up.

Is there anything we can do to avoid this? I know the API for clearing and monitoring the shared memory zones until now has only been available in nginx plus - but we are strictly on a FOSS-only diet so using anything like that is obviously out of the question.

Thanks, and take care,
Eirik Øverby

Reply via email to