On Mon, May 18, 2026 at 08:39:37PM +0800, ziyu zhang wrote:
On Mon, 18 May 2026 at 18:16, Stefano Garzarella <[email protected]> wrote:
On Sat, May 16, 2026 at 11:47:45AM +0800, Ziyu Zhang wrote:
>vsock_poll() reads vsk->peer_shutdown before taking the socket
>lock to set EPOLLHUP and EPOLLRDHUP, then reads it again under the
>lock to report EOF readability. A shutdown packet can update
>peer_shutdown while poll is waiting for the lock, so one poll invocation
>can report EPOLLIN without the corresponding HUP/RDHUP bits.
>
>Keep non-connectible sockets on a single lockless READ_ONCE()
Should this be paired with WRITE_ONCE() on writers?
Yes, since the poll path uses lockless READ_ONCE() snapshots of
peer_shutdown, the writer side should be annotated with WRITE_ONCE() as
well. I will add that in v2.
>snapshot. For connectible sockets, defer shutdown-derived poll bits
>until after lock_sock() and use one READ_ONCE() snapshot for both EOF
>readability and HUP/RDHUP. This preserves shutdowns that arrive before
>the lock is acquired and keeps all peer-shutdown-derived bits consistent
>for a poll pass.
>
>Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
>Signed-off-by: Ziyu Zhang <[email protected]>
>---
> net/vmw_vsock/af_vsock.c | 40 ++++++++++++++++++++++++++--------------
> 1 file changed, 26 insertions(+), 14 deletions(-)
>
>diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
>index adcba1b7b..bed42347b 100644
>--- a/net/vmw_vsock/af_vsock.c
>+++ b/net/vmw_vsock/af_vsock.c
>@@ -1122,6 +1122,25 @@ static int vsock_shutdown(struct socket *sock, int mode)
> return err;
> }
>
>+static __poll_t vsock_poll_shutdown(struct sock *sk, u32 peer_shutdown)
>+{
>+ __poll_t mask = 0;
>+
>+ /* INET sockets treat local write shutdown and peer write shutdown as a
>+ * case of EPOLLHUP set.
>+ */
>+ if (sk->sk_shutdown == SHUTDOWN_MASK ||
>+ ((sk->sk_shutdown & SEND_SHUTDOWN) &&
>+ (peer_shutdown & SEND_SHUTDOWN)))
>+ mask |= EPOLLHUP;
>+
>+ if (sk->sk_shutdown & RCV_SHUTDOWN ||
>+ peer_shutdown & SEND_SHUTDOWN)
>+ mask |= EPOLLRDHUP;
>+
>+ return mask;
>+}
>+
> static __poll_t vsock_poll(struct file *file, struct socket *sock,
> poll_table *wait)
> {
>@@ -1139,19 +1158,9 @@ static __poll_t vsock_poll(struct file *file, struct
socket *sock,
> /* Signify that there has been an error on this socket. */
> mask |= EPOLLERR;
>
>- /* INET sockets treat local write shutdown and peer write shutdown as a
>- * case of EPOLLHUP set.
>- */
>- if ((sk->sk_shutdown == SHUTDOWN_MASK) ||
>- ((sk->sk_shutdown & SEND_SHUTDOWN) &&
>- (vsk->peer_shutdown & SEND_SHUTDOWN))) {
>- mask |= EPOLLHUP;
>- }
>-
>- if (sk->sk_shutdown & RCV_SHUTDOWN ||
>- vsk->peer_shutdown & SEND_SHUTDOWN) {
>- mask |= EPOLLRDHUP;
>- }
>+ if (!sock_type_connectible(sk->sk_type))
>+ mask |= vsock_poll_shutdown(sk,
>+ READ_ONCE(vsk->peer_shutdown));
Can we move this in the `if (sock->type == SOCK_DGRAM)` branch ?
Not a strong opinion about that, but in any case IMO we should add a
comment here to explain why we are doing only for not connectible
sockets.
That said, if we use WRITE_ONCE in the writers, do we really need to
move this after the lock_sock for the connectable ones?
Yes, I will move the non-connectible handling into the SOCK_DGRAM branch
and add a comment there.
For connectible sockets, my current understanding is that
READ_ONCE()/WRITE_ONCE() make the individual lockless accesses explicit, but
they do not make two separate reads in one vsock_poll() invocation observe the
same peer_shutdown value. So I still think using one peer_shutdown snapshot
after lock_sock() is useful for keeping the returned mask internally
consistent. Please let me know if you think WRITE_ONCE() is enough for this
case.
What will be the issue of "do not make two separate reads in one
vsock_poll() invocation observe the same peer_shutdown value." ?
In any case, I'm not against it; I just want to understand the issue
better.
Thanks,
Stefano