On Tue, 09 Jun 2026 13:40:23 -0400 Steve Grubb wrote:
> > > You're right, it is. I see how this flag would fix the pathological
> > > behavior that was reported. But as I have looked at this suggestion,
> > > there seems to be one wrinkle. User space should not need to know that
> > > the audit code in the kernel has this retry mechanism.  
> > 
> > It's not about the retry mechanism, at least in my mind - I read
> > your reply as "user space should not know that there was congestion".
> > Why?  
> 
> In the audit case, it is not useful. I know there can be an endless supply 
> and there's not much that can be done except dequeueing what's next.
> 
> > It's not very useful, I get that, but user space can just clear
> > the congestion signal and keep going.  
> 
> How? The recvfrom man page doesn't even discuss ENOBUFS. Which is one of the 
> strongest arguments for a kernel side patch. The fact that there is exists a 
> socket option to declare that you do not want ENOBUFS on netlink sockets is 
> esoteric knowledge. The netlink(7) man page does cover the flag. But even 
> where it discusses ENOBUFS, it does not mention that this is preventable by 
> setting a socket option. I do appreciate this being pointed out. But getting 
> from the recvfrom man page to a solution is not obvious.

socket errors are generally "consumed" when they are returned.
The user space should see one ENOBUF and then once the rcvbuf
is drained completely the CONGESTION bit should also get auto
cleared. This is my mental model how Netlink works, LMK if
you're seeing different behavior, my memory is faulty...

> > > It seems like the audit subsystem should set the flag on auditd's
> > > socket at registration time in auditd_set(). The kernel is the right
> > > place for this because it's the kernel that manages the retry/ hold
> > > queues and sets the sk_sndtimeo that triggers the overrun path -
> > > auditd has no knowledge of these internals.  
> > 
> > We have to carry this code somewhere, either in user space or in
> > the kernel. I'd prefer not to carry it in the kernel.  
> 
> I can put this in the audit daemon. But whoever else writes a similar app 
> will have to independently discover the same solution when faced with the 
> pathologically bad behavior. A kernel side fix would have made it easier for 
> future app developers to be successful.

Reply via email to