On Wed, 13 May 2026 14:24:43 -0300 Ricardo Robaina wrote:
> When auditd is bottlenecked (e.g., by slow disk I/O), kauditd blocks on
> the netlink socket.

Holding socket lock during slow IO sounds very wrong. One could say -
that's abuse of the socket lock?

> If the wait timeout fully expires (timeo == 0),
> netlink mistakenly interprets the zeroed timeout as a non-blocking
> request. It then triggers netlink_overrun that drops the event,
> completely bypassing the audit subsystem's internal retry queue, and
> falsely returns ENOBUFS to user-space, resulting in the following error:
> 
>  auditd[]: Error receiving audit netlink packet (No buffer space available)
> 
> Fix this by detecting when a blocking sender's timeout has expired
> (timeo == 0 && !nonblock) in netlink_unicast(). In this case, instead
> of retrying with timeo=0 (which would incorrectly trigger netlink_overrun
> on the next iteration), safely free the skb and return -EAGAIN, allowing
> the audit subsystem to gracefully enqueue the pending event into its
> internal backlog.

The socket _is_ the queue, normally.

Please explore fixing this in audit?
-- 
pw-bot: cr

Reply via email to