We are experiencing strange failures where the audit daemon fails to start on boot, hitting an ENOBUFS error on its audit_set_pid() call. This can be reproduced by repeatedly restarting the audit daemon while the system is under heavy audit load. This also seems to be dependent on the number of CPUs - we can reproduce this with 2 CPUs but not with 48.
Tracing showed a race between the kernel enabling audit messages to be sent to the daemon and actually sending the ACK, wherein the socket buffer could get filled by audit messages before the ACK could be sent, leading to the ACK being dropped and ENOBUFS set on the socket by netlink code. A patch to mitigate this race from the kernel side is separately under discussion on the audit subsystem mailing list: https://lore.kernel.org/audit/20230922152749.244197-1-chris.ric...@nutanix.com/ It's worth noting that this is almost certainly the same issue observed in this thread from last month (participants CCed): https://listman.redhat.com/archives/linux-audit/2023-September/020087.html Here, I am hoping to discuss ACK handling from the userspace side. The current implementation is a little odd - check_ack() will happily return success without seeing an ACK if a non-ACK message is top of the socket queue, but will fail if no message arrives within the timeout. It also of course fails if ENOBUFS is set on the socket, but this failure only seems to matter when doing audit_set_pid() - similar failures during main-loop message processing are logged but otherwise ignored, as far as I can tell. I'm not sure I quite understand the intentions of the code, but it seems odd to let ENOBUFS be a fatal error here, given that it likely means the socket buffer got flooded with audit messages, and thus audit_set_pid() succeeded. Perhaps we should just ignore ENOBUFS or even set NETLINK_NO_ENOBUFS? It may also be worth increasing the netlink socket buffer size, though this could only make the issue less likely and would not be sufficient under arbitrarily heavy audit loads. Finally, there is another oddity in audit_set_pid() that is tangential to this discussion but worth highlighting: if the wmode parameter is WAIT_YES, then there is some additional ACK-handling which waits for 100 milliseconds and eats the top message of the socket queue if one arrives, without inspecting it. This seems completely wrong as the ACK will have already been consumed by check_ack() if there was one, and so the best this code can do is nothing, and at worst (quite likely) it will swallow a genuine audit message without ever recording it. - Chris -- Linux-audit mailing list Linux-audit@redhat.com https://listman.redhat.com/mailman/listinfo/linux-audit