On Mon, May 18, 2026 at 08:39:37PM +0800, ziyu zhang wrote:
On Mon, 18 May 2026 at 18:16, Stefano Garzarella <[email protected]> wrote:

On Sat, May 16, 2026 at 11:47:45AM +0800, Ziyu Zhang wrote:
>vsock_poll() reads vsk->peer_shutdown before taking the socket
>lock to set EPOLLHUP and EPOLLRDHUP, then reads it again under the
>lock to report EOF readability. A shutdown packet can update
>peer_shutdown while poll is waiting for the lock, so one poll invocation
>can report EPOLLIN without the corresponding HUP/RDHUP bits.
>
>Keep non-connectible sockets on a single lockless READ_ONCE()

Should this be paired with WRITE_ONCE() on writers?

Yes, since the poll path uses lockless READ_ONCE() snapshots of
peer_shutdown, the writer side should be annotated with WRITE_ONCE() as
well. I will add that in v2.


>snapshot. For connectible sockets, defer shutdown-derived poll bits
>until after lock_sock() and use one READ_ONCE() snapshot for both EOF
>readability and HUP/RDHUP. This preserves shutdowns that arrive before
>the lock is acquired and keeps all peer-shutdown-derived bits consistent
>for a poll pass.
>
>Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
>Signed-off-by: Ziyu Zhang <[email protected]>
>---
> net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++++++--------------
> 1 file changed, 26 insertions(+), 14 deletions(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index adcba1b7b..bed42347b 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1122,6 +1122,25 @@ static int vsock_shutdown(struct socket *sock, int mode)
>       return err;
> }
>
>+static __poll_t vsock_poll_shutdown(struct sock *sk, u32 peer_shutdown)
>+{
>+      __poll_t mask = 0;
>+
>+      /* INET sockets treat local write shutdown and peer write shutdown as a
>+       * case of EPOLLHUP set.
>+       */
>+      if (sk->sk_shutdown == SHUTDOWN_MASK ||
>+          ((sk->sk_shutdown & SEND_SHUTDOWN) &&
>+           (peer_shutdown & SEND_SHUTDOWN)))
>+              mask |= EPOLLHUP;
>+
>+      if (sk->sk_shutdown & RCV_SHUTDOWN ||
>+          peer_shutdown & SEND_SHUTDOWN)
>+              mask |= EPOLLRDHUP;
>+
>+      return mask;
>+}
>+
> static __poll_t vsock_poll(struct file *file, struct socket *sock,
>                              poll_table *wait)
> {
>@@ -1139,19 +1158,9 @@ static __poll_t vsock_poll(struct file *file, struct 
socket *sock,
>               /* Signify that there has been an error on this socket. */
>               mask |= EPOLLERR;
>
>-      /* INET sockets treat local write shutdown and peer write shutdown as a
>-       * case of EPOLLHUP set.
>-       */
>-      if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
>-          ((sk->sk_shutdown & SEND_SHUTDOWN) &&
>-           (vsk->peer_shutdown & SEND_SHUTDOWN))) {
>-              mask |= EPOLLHUP;
>-      }
>-
>-      if (sk->sk_shutdown & RCV_SHUTDOWN ||
>-          vsk->peer_shutdown & SEND_SHUTDOWN) {
>-              mask |= EPOLLRDHUP;
>-      }
>+      if (!sock_type_connectible(sk->sk_type))
>+              mask |= vsock_poll_shutdown(sk,
>+                                          READ_ONCE(vsk->peer_shutdown));

Can we move this in the `if (sock->type == SOCK_DGRAM)` branch ?

Not a strong opinion about that, but in any case IMO we should add a
comment here to explain why we are doing only for not connectible
sockets.

That said, if we use WRITE_ONCE in the writers, do we really need to
move this after the lock_sock for the connectable ones?

Yes, I will move the non-connectible handling into the SOCK_DGRAM branch
and add a comment there.

For connectible sockets, my current understanding is that
READ_ONCE()/WRITE_ONCE() make the individual lockless accesses explicit, but
they do not make two separate reads in one vsock_poll() invocation observe the
same peer_shutdown value. So I still think using one peer_shutdown snapshot
after lock_sock() is useful for keeping the returned mask internally
consistent. Please let me know if you think WRITE_ONCE() is enough for this
case.

What will be the issue of "do not make two separate reads in one vsock_poll() invocation observe the same peer_shutdown value." ?

In any case, I'm not against it; I just want to understand the issue better.

Thanks,
Stefano


Reply via email to