On Thu, Jan 22, 2026 at 03:56 AM GMT, Jiayuan Chen wrote:
> January 21, 2026 at 20:55, "Jiayuan Chen" <[email protected]
> mailto:[email protected]?to=%22Jiayuan%20Chen%22%20%3Cjiayuan.chen%40linux.dev%3E
>> wrote:
>> January 21, 2026 at 17:36, "Jakub Sitnicki" <[email protected]
>> mailto:[email protected]?to=%22Jakub%20Sitnicki%22%20%3Cjakub%40cloudflare.com%3E
>> >  I've been thinking about this some more and came to the conclusion that
>> >  this udp_bpf_ioctl implementation is actually what we want, while
>> >  tcp_bpf_ioctl *should not* be checking if the sk_receive_queue is
>> >  non-empty.
>> >  
>> >  Why? Because the verdict prog might redirect or drop the skbs from
>> >  sk_receive_queue once it actually runs. The messages might never appear
>> >  on the msg_ingress queue.
>> >  
>> >  What I think we should be doing, in the end, is kicking the
>> >  sk_receive_queue processing on bpf_map_update_elem, if there's data
>> >  ready.
>> >  
>> >  The API semantics I'm proposing is:
>> >  
>> >  1. ioctl(FIONREAD) -> reports N bytes
>> >  2. bpf_map_update_elem(sk) -> socket inserted into sockmap
>> >  3. poll() for POLLIN -> wait for socket to be ready to read
>> >  5. ioctl(FIONREAD) -> report N bytes if verdict prog didn't
>> >  redirect or drop it
>> >  
>> >  We don't have to add the the queue kick on map update in this series.
>> >  
>> >  If you decide to leave it for later, can I ask that you open an issue at
>> >  our GH project [1]?
>> >  
>> >  I don't want it to fall through the cracks. And I sometimes have people
>> >  asking what they could help with in sockmap.
>> >  
>> >  Thanks,
>> >  -jkbs
>> >  
>> >  [1] https://github.com/sockmap-project/sockmap-project/issues
>> > 
>> Hi Jakub,
>> 
>> Thanks for taking the time to think through this carefully. I agree with your
>> analysis - reporting sk_receive_queue length is misleading since the verdict
>> prog might redirect or drop those skbs.
>> 
>> There's no rush to merge this patch.
>> 
>> Since the kick queue on bpf_map_update_elem addresses a closely related 
>> issue,
>> I think it makes sense to include it in this patchset for easier tracking 
>> rather
>> than splitting it out.
>> 
>> I'll spend more time looking into this and come back with an updated version.
>> 
>> Thanks,
>> Jiayuan
>>
>
>
> Hi Jakub,
>
>   I've been thinking about this more, and I realize the problem is not as 
> simple as it seems.
>
>   Regarding kicking the sk_receive_queue on bpf_map_update_elem: the BPF
>   program may not be fully initialized at that point. For example, with a
>   redirect program, the destination fd might not yet be inserted into the
>   map. If we kick the data through the BPF program immediately, the
>   redirect lookup would fail, leading to unexpected behavior (data being
>   dropped or passed to the wrong socket).

I reckon there is not much we can do about it because we have no control
over when inserts/removes sockets from sockmap. It can happen at any
time.

Also, a newly received segment can trigger sk_data_ready callback,
and that would also cause the skbs to get processed. We don't have
control of that either.

Does this change break any of our existing tests/benchmarks or some
other setup of yours?

>   I also considered triggering the kick in poll/select via
>   sk_msg_is_readable(). However, this approach doesn't work for TCP
>   because tcp_poll() -> tcp_stream_is_readable() -> tcp_epollin_ready()
>   will return early when sk_receive_queue has data, before ever calling
>   sk_is_readable().
>
>   In the next version, I'll address your other nits and remove the
>   sk_receive_queue check from tcp_bpf_ioctl. I'll also open an issue on
>   the GH project to track this problem so we can continue exploring
>   better solutions.

Sounds like a plan. Thanks!

Reply via email to