On Sat, May 21, 2022 at 05:37:10PM +0100, Nikolaus Rath wrote: > On May 21 2022, "Richard W.M. Jones" <rjo...@redhat.com> wrote: > > On Sat, May 21, 2022 at 01:21:11PM +0100, Nikolaus Rath wrote: > >> Hi, > >> > >> How does the blocksize filter take into account writes that end-up > >> overlapping due to read-modify-write cycles? > >> > >> Specifically, suppose there are two non-overlapping writes handled > >> by two different threads, that, due to blocksize requirements, > >> overlap when expanded. I think there is a risk that one thread may > >> partially undo the work of the other here. > >> > >> Looking at the code, it seems that writes of unaligned heads and > >> tails are protected with a global lock., but writes of aligned data > >> can occur concurrently. > > > > I agree. > > > > Assuming the underlying plugin is NBDKIT_THREAD_MODEL_PARALLEL and no > > other filters impose thread model limits, the blocksize filter does > > not limit the thread model, so the thread model of nbdkit would also > > be NBDKIT_THREAD_MODEL_PARALLEL. > > > > That means that two writes either on different connections or > > pipelined on the same connection could happen at the same time. > > “blocksize_pwrite” would be called concurrently for the two requests. > > > >> However, does this not miss the case where there is one unaligned > >> write that overlaps with an aligned one? > >> > >> For example, with blocksize 10, we could have: > >> > >> Thread 1: receives write request for offset=0, size=10 > >> Thread 2: receives write request for offset=4, size=16 > >> Thread 1: acquires lock, reads bytes 0-4 > >> Thread 2: does aligned write (no locking needed), writes bytes 0-10 > >> Thread 1: writes bytes 0-10, overwriting data from Thread 2 > > > > I believe this analysis is correct. (CC'd to Eric who knows a lot > > more about this.) > > > > However I don't think it's a bug. If a client doesn't want writes to > > squash each other, then it shouldn't send overlapping requests. I bet > > the same thing happens with an SSD. > > But the requests are not overlapping from the client point of view. They > only become overlapping when the server applies its read-modify-write > operation to align them to the blocksize.
I'm going to leave this one to Eric who's an expert on this ("write tearing", I think). > I think you elsewhere said that the blocksize reported by the NBD server > is only a preferred blocksize, so I'd be surprised if not following this > "preference" results in data corruption. This is true for NBD at the moment, but I think everyone accepts it's a mistake in the protocol. Eric was looking into this too. > > NBD_CMD_FLAG_FUA is provided for clients that wish to ensure that a > > write has been committed before sending another request. > > > > Do you have an example of a client which sends overlapping requests > > and depends on particular behaviour of the server? You may be able to > > get it to work by using nbdkit-noparallel-filter which can be used to > > serialize nbdkit. > > I'm working with the kernel's NBD client, and it would explain all the > mysterious data corruption issues that I've seen with the S3 plugin. But > I have not yet confirmed definitely that this is the root cause. > > For now, I'll avoid the blocksize filter and instead do the > read-modify-write in the plugin with proper locking. If that fixes it, > then I think we can conclude that the kernel is sending such requests > (but, as I said above, I would not consider them overlapping nor would I > consider this a bug). Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com nbdkit - Flexible, fast NBD server with plugins https://gitlab.com/nbdkit/nbdkit _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs