On Mon, Jun 13, 2022 at 01:25:39PM -0400, Josef Bacik wrote: > On Mon, Jun 13, 2022 at 6:24 AM Richard W.M. Jones <[email protected]> wrote: > > > > On Mon, Jun 13, 2022 at 10:33:58AM +0100, Nikolaus Rath wrote: > > > Hello, > > > > > > I am trying to improve performance of the scenario where the kernel's > > > NBD client talks to NBDKit's S3 plugin. > > > > > > For me, the main bottleneck is currently due to the fact that the kernel > > > aligns requests to only 512 B, no matter the blocksize reported by > > > nbdkit. > > > > > > Using a 512 B object size is not feasible (due to latency and request > > > overhead). However, with a larger object size there are two conflicting > > > objectives: > > > > > > 1. To maximize parallelism (which is important to reduce the effects of > > > connection latency), it's best to limit the size of the kernel's NBD > > > requests to the object size. > > > > > > 2. To minimize un-aligned writes, it's best to allow arbitrarily large > > > NBD requests, because the larger the requests the larger the amount of > > > full blocks that are written. Unfortunately this means that all objects > > > touched by the request are written sequentially. > > > > > > I see a number of ways to address that: > > > > > > 1. Change the kernel's NBD code to honor the blocksize reported by the > > > NBD server. This would be ideal, but I don't feel up to making this > > > happen. Theoretical solution only. > > > > This would be the ideal solution. I wonder how technically > > complicated it would be actually? > > > > AIUI you'd have to modify nbd-client to query the block limits from > > the server, which is the hardest part of this, but it's all userspace > > code. Then you'd pass those down to the kernel via the ioctl (see > > drivers/block/nbd.c:__nbd_ioctl). Then inside the kernel you'd call > > blk_queue_io_min & blk_queue_io_opt with the values (I'm not sure how > > you set the max request size, or if that's possible). See > > block/blk-settings.c for details of these functions. > > > > Exactly this. The kernel just does what the client tells it to do, > and the kernel can be configured for whatever blocksize. > Unfortunately there's not a way for the server to advertise to the > client what to do, you have to configure it on the client. Adding > some code to userspace negotiation that happens is the right thing to > do here to pull the blocksize, and then simply pass this into the > configuration stuff in the nbd-client and it uses the appropriate > netlink tag to set the blocksize.
For context, the NBD protocol can now advertise during the initial handshake, minimum, preferred and maximum block sizes: https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md#block-size-constraints nbdkit (since 1.30) supports this, for example: $ nbdkit eval get_size='echo 256M' block_size='echo 64k 1M 32M' $ nbdinfo nbd://localhost protocol: newstyle-fixed without TLS export="": export-size: 268435456 (256M) uri: nbd://localhost:10809/ contexts: base:allocation is_rotational: false is_read_only: true can_cache: false can_df: true can_fast_zero: false can_flush: false can_fua: false can_multi_conn: false can_trim: false can_zero: false block_size_minimum: 65536 <--- block_size_preferred: 1048576 <--- block_size_maximum: 33554432 <--- Rich. > > As a quick test you could try calling blk_queue_io_* in the kernel > > driver with hard-coded values, to see if that modifies the requests > > that are seen by nbdkit. Should give you some confidence before > > making the full change. > > > > BTW I notice that the kernel NBD driver always reports that it's a > > non-rotational device, ignoring the server setting ... > > That I can fix easily, I'll get that done. Thanks, > > Josef -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html _______________________________________________ Libguestfs mailing list [email protected] https://listman.redhat.com/mailman/listinfo/libguestfs
