When inet_csk_listen_stop() migrates an established child socket from
a closing listener to another socket in the same SO_REUSEPORT group,
the target listener gets a new accept-queue entry via
inet_csk_reqsk_queue_add(), but that path never notifies the target
listener's waiters. A nonblocking accept() still works because it
checks the queue directly, but poll()/epoll_wait() waiters and
blocking accept() callers can also remain asleep indefinitely.

Call READ_ONCE(nsk->sk_data_ready)(nsk) after a successful migration
in inet_csk_listen_stop().

However, after inet_csk_reqsk_queue_add() succeeds, the ref acquired
in reuseport_migrate_sock() is effectively transferred to
nreq->rsk_listener. Another CPU can then dequeue nreq via accept()
or listener shutdown, hit reqsk_put(), and drop that listener ref.
Since listeners are SOCK_RCU_FREE, wrap the post-queue_add()
dereferences of nsk in rcu_read_lock()/rcu_read_unlock(), which also
covers the existing sock_net(nsk) access in that path.

The reqsk_timer_handler() path does not need the same changes for two
reasons: half-open requests become readable only after the final ACK,
where tcp_child_process() already wakes the listener; and once nreq is
visible via inet_ehash_insert(), the success path no longer touches
nsk directly.

Fixes: 54b92e841937 ("tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in 
accept queues.")
Cc: [email protected]
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Zhenzhong Wu <[email protected]>
---
 net/ipv4/inet_connection_sock.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 4ac3ae1bc..928654c34 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1479,16 +1479,19 @@ void inet_csk_listen_stop(struct sock *sk)
                        if (nreq) {
                                refcount_set(&nreq->rsk_refcnt, 1);
 
+                               rcu_read_lock();
                                if (inet_csk_reqsk_queue_add(nsk, nreq, child)) 
{
                                        __NET_INC_STATS(sock_net(nsk),
                                                        
LINUX_MIB_TCPMIGRATEREQSUCCESS);
                                        reqsk_migrate_reset(req);
+                                       READ_ONCE(nsk->sk_data_ready)(nsk);
                                } else {
                                        __NET_INC_STATS(sock_net(nsk),
                                                        
LINUX_MIB_TCPMIGRATEREQFAILURE);
                                        reqsk_migrate_reset(nreq);
                                        __reqsk_free(nreq);
                                }
+                               rcu_read_unlock();
 
                                /* inet_csk_reqsk_queue_add() has already
                                 * called inet_child_forget() on failure case.
-- 
2.43.0


Reply via email to