On 01/09/2017 04:04 PM, Jesper Dangaard Brouer wrote:
> This patch split the global and per (inet)peer ICMP-reply limiter
> code, and moves the global limit check to earlier in the packet
> processing path. Thus, avoid spending cycles on ICMP replies that
> gets limited/suppressed anyhow.
>
> The global ICMP rate limiter icmp_global_allow() is a good solution,
> it just happens too late in the process. The kernel goes through the
> full route lookup (return path) for the ICMP message, before taking
> the rate limit decision of not sending the ICMP reply.
>
> Details: The kernels global rate limiter for ICMP messages got added
> in commit 4cdf507d5452 ("icmp: add a global rate limitation"). It is
> a token bucket limiter with a global lock. It brilliantly avoids
> locking congestion by only updating when 20ms (HZ/50) were elapsed. It
> can then avoids taking lock when credit is exhausted (when under
> pressure) and time constraint for refill is not yet meet.
This patch removed the rate limit bypass for localhost. As a result, it
is impossible to write deterministic UDP client tests tests which
exercise failover behavior in response to unreachable servers.
H.J. Lu noted that a glibc test started failing on kernel 4.11 and
identified the regression:
https://sourceware.org/ml/libc-alpha/2017-06/msg00167.html
(I have more tests which are afflicted by this, but are not yet in glibc
upstream.)
This is particularly annoying because we already run such tests in a
network namespace for isolation, but the rate limit counter is global,
so that doesn't help here.
I'm attaching a self-contained test case. It fails for me with:
localhost-icmp: iteration 50: no ICMP message (poll timeout)
On kernel 4.10, it passes and runs within just a few milliseconds.
Would you please fix this in some way? Thanks.
Florian
#include <arpa/inet.h>
#include <err.h>
#include <netinet/in.h>
#include <stdio.h>
#include <sys/poll.h>
#include <sys/socket.h>
#include <unistd.h>
/* How many UDP packets to send to a non-responding part. */
enum { ITERATIONS = 1000 };
int
main (void)
{
/* Pick a port number which is likely unused. */
unsigned short port;
{
int sock = socket (AF_INET, SOCK_DGRAM, 0);
if (sock < 0)
err (1, "socket");
struct sockaddr_in sin = { .sin_family = AF_INET };
if (bind (sock, (struct sockaddr *) &sin, sizeof (sin)) < 0)
err (1, "bind");
socklen_t sinlen = sizeof (sin);
if (getsockname (sock, (struct sockaddr *) &sin, &sinlen))
err (1, "getsockname");
if (sinlen != sizeof (sin) || sin.sin_family != AF_INET)
errx (1, "wrong address information for socket");
if (close (sock) < 0)
err (1, "close");
port = sin.sin_port;
}
for (int i = 0; i < ITERATIONS; ++i)
{
int sock = socket (AF_INET, SOCK_DGRAM, 0);
if (sock < 0)
err (1, "socket");
struct sockaddr_in sin =
{
.sin_family = AF_INET,
.sin_addr = { ntohl (INADDR_LOOPBACK) },
.sin_port = port,
};
if (connect (sock, (struct sockaddr *) &sin, sizeof (sin)) < 0)
err (1, "connect");
if (sendto (sock, "", 1, 0, NULL, 0) < 0)
err (1, "sendto");
struct pollfd fd = { .fd = sock, .events = POLLIN };
int ret = poll (&fd, 1, 5000);
if (ret < 0)
err (1, "poll");
if (ret == 0)
errx (1, "iteration %d: no ICMP message (poll timeout)", i);
if (close (sock) < 0)
err (1, "close");
}
}