On 6/30/19 12:54 PM, Richard W.M. Jones wrote: > On Thu, Jun 27, 2019 at 10:18:30PM -0500, Eric Blake wrote: >> + /* Queue up a write command so large that we block on POLLIN, then queue >> + * multiple disconnects. XXX The last one should fail. >> + */ >> + if (nbd_aio_pwrite (nbd, buf, 2 * 1024 * 1024, 0, 0) == -1) { >> + fprintf (stderr, "%s: %s\n", argv[0], nbd_get_error ()); >> + exit (EXIT_FAILURE); >> + } >> + if ((nbd_aio_get_direction (nbd) & LIBNBD_AIO_DIRECTION_WRITE) == 0) { >> + fprintf (stderr, "%s: test failed: " >> + "expect to be blocked on write\n", >> + argv[0]); >> + exit (EXIT_FAILURE); >> + } > > This test fails when run under valgrind. An abbreviated log shows > what's happening: > > libnbd: debug: nbd_aio_pwrite: event CmdIssue: READY -> ISSUE_COMMAND.START > libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.START -> > ISSUE_COMMAND. > SEND_REQUEST > libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.SEND_REQUEST -> > ISSUE_C > OMMAND.PREPARE_WRITE_PAYLOAD > libnbd: debug: nbd_aio_pwrite: transition: > ISSUE_COMMAND.PREPARE_WRITE_PAYLOAD - >> ISSUE_COMMAND.SEND_WRITE_PAYLOAD > libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.SEND_WRITE_PAYLOAD > -> I > SSUE_COMMAND.FINISH > libnbd: debug: nbd_aio_pwrite: transition: ISSUE_COMMAND.FINISH -> READY > /home/rjones/d/libnbd/tests/.libs/lt-errors: test failed: expect to be > blocked on write > > It seems as if this is caused by valgrinded code running more slowly, > rather than an actual valgrind/memory error.
Or even that valgrind's interception of send()/recv() performs buffering differently than we get by default from the kernel. I don't know if running strace on valgrind is a sensible enough thing to do to see syscall behavior? > > I wonder if we could remove the race using a custom nbdkit-sh-plugin > which would block on writes until (eg) a local trigger file was > touched? Even that seems as if it would depend on the amount of data > that the kernel is able to buffer. I don't know how to make an nbdkit plugin stop the code in nbdkit/server from read()ing from the client (the plugin code doesn't get to run until the core has learned that the client wants a command serviced). But it may be possible to tweak things to send back-to-back write requests, where even if the first write request gets sent completely, the plugin can delay responding to that first write and use --filter=noparallel to prevent the second command from reaching nbdkit. I'll play with that, to see if I can reproduce the valgrind race, as well as work around it with back-to-back write commands to increase the likelihood of actually preventing nbdkit from consuming the second command. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://www.redhat.com/mailman/listinfo/libguestfs