On Tue, Mar 31, 2026 at 3:44 PM Michal Luczaj <[email protected]> wrote:
>
> On 3/31/26 01:27, Kuniyuki Iwashima wrote:
> > On Mon, Mar 30, 2026 at 4:04 PM Michal Luczaj <[email protected]> wrote:
> >> On 3/26/26 07:26, Martin KaFai Lau wrote:
> >>> On 3/15/26 4:58 PM, Michal Luczaj wrote:
> >>>>> Beside, from looking at the may_update_sockmap(), I don't know if it is
> >>>>> even doable (or useful) to bpf_map_update_elem(unix_sk) in
> >>>>> tc/flow_dissector/xdp. One possible path is the SOCK_FILTER when looking
> >>>>> at unix_dgram_sendmsg() => sk_filter(). It was not the original use case
> >>>>> when the bpf_map_update_elem(sockmap) support was added iirc.
> >>>>
> >>>> What about a situation when unix_sk is stored in a sockmap, then tc prog
> >>>> looks it up and invokes bpf_map_update_elem(unix_sk)? I'm not sure it's
> >>>> useful, but seems doable.
> >>>
> >>> [ Sorry for the late reply ]
> >>>
> >>> It is a bummer that the bpf_map_update_elem(unix_sk) path is possible
> >>> from tc :(
> >>>
> >>> Then unix_state_lock() in its current form cannot be safely acquired in
> >>> sock_map_update_elem(). It is currently a spin_lock() instead of
> >>> spin_lock_bh().
> >>
> >> Is there a specific deadlock you have in your mind?
> >
> > lockdep would complain if we used the same lock from
> > different contexts.
> >
> > e.g.)
> > Process context holding unix_state_lock() with the normal spin_lock()
> > -> BH interrupt
> > -> tc prog trying to hold the same lock with spin_lock(). (_bh())
> > -> deadlock
>
> OK, I get it, thanks.
>
> So here's one more idea: the null-ptr-deref issue is connect() racing
> against sock_map_update_elem_*SYS*() coming from user-space, not the
> can-be-called-from-BH sock_map_update_elem() variant. So can't we assume
> that for any sock_map_update_elem(unix_sk) invoked by a tc prog, unix_sk
> will always be "stable", i.e. in a state that cannot lead to that
> null-ptr-deref?
>
> IOW, if for a tc prog the only way to get hold of unix_sk is look it up in
> a sockmap, then (by the fact that unix_sk _is_ in the sockmap) unix_sk will
> be already safe to use by sock_map_update_elem() without taking the af_unix
> state lock.
>
> Long story short: take the unix state lock in sock_map_update_elem_sys(),
> don't bother in sock_map_update_elem()?
but it will prevents bpf iter from taking unix_state_lock().
I don't see a good reason to introduce a new locking rule by unnecessarily
wrapping the entire sockmap update with unix_state_lock() even though
the root bug can be avoided by a simple null check in a leaf function.
>
> >> ...
> >> And sock_{map,hash}_seq_show() (being a part of bpf iter machinery) needs
> >> to take lock_sock() just as well? Would that require a special-casing
> >> (unix_state_lock()) for af_unix?
> >
> > Right, lock_sock() + unix_state_lock() + SOCK_DEAD check
> > should be best.
>
> OK, got it.
>