On Wed, May 07, 2025 at 03:49:42PM -0600, Uday Shankar wrote: > +Load balancing > +-------------- > + > +A simple approach to designing a ublk server might involve selecting a > +number of I/O handler threads N, creating devices with N queues, and > +pairing up I/O handler threads with queues, so that each thread gets a > +unique qid, and it issues ``FETCH_REQ``s against all tags for that qid. ``FETCH_REQ``\s (escape s) > +Indeed, before the introduction of the ``UBLK_F_RR_TAGS`` feature, this > +was essentially the only option (*)
Use reST footnotes syntax, i.e.: ---- >8 ---- diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst index 440b63be4ea8b6..b1d29fceff4e80 100644 --- a/Documentation/block/ublk.rst +++ b/Documentation/block/ublk.rst @@ -325,7 +325,7 @@ number of I/O handler threads N, creating devices with N queues, and pairing up I/O handler threads with queues, so that each thread gets a unique qid, and it issues ``FETCH_REQ``\s against all tags for that qid. Indeed, before the introduction of the ``UBLK_F_RR_TAGS`` feature, this -was essentially the only option (*) +was essentially the only option [#]_ This approach can run into performance issues under imbalanced load. This architecture taken together with the `blk-mq architecture @@ -368,8 +368,8 @@ With this setup, I/O submitted on a CPU which maps to queue 0 will be balanced across all threads instead of all landing on the same thread. Thus, a potential bottleneck is avoided. -(*) technically, one I/O handling thread could service multiple queues -if it wanted to, but that doesn't help with imbalanced load +.. [#] Technically, one I/O handling thread could service multiple queues + if it wanted to, but that doesn't help with imbalanced load Zero copy --------- > + > +This approach can run into performance issues under imbalanced load. > +This architecture taken together with the `blk-mq architecture > +<https://docs.kernel.org/block/blk-mq.html>`_ implies that there is a This architecture, taken together with the :doc:`blk-mq architecture </block/blk-mq>`, implies that ... > +fixed mapping from I/O submission CPU to the ublk server thread that > +handles it. If the workload is CPU-bottlenecked, only allowing one ublk > +server thread to handle all the I/O generated from a single CPU can > +limit peak bandwidth. > + > <snipped>... > +With these changes, a ublk server can balance load as follows: > + > +- create the device with ``UBLK_F_RR_TAGS`` set in > + ``ublksrv_ctrl_dev_info::flags`` when issuing the ``ADD_DEV`` command > +- issue ``FETCH_REQ``s from ublk server threads to (qid,tag) pairs in > + a round-robin manner. For example, for a device configured with > + ``nr_hw_queues=2`` and ``queue_depth=4``, and a ublk server having 4 > + I/O handling threads, ``FETCH_REQ``s could be issued as follows, where > + each entry in the table is the pair (``ublksrv_io_cmd::q_id``, > + ``ublksrv_io_cmd::tag``) in the payload of the ``FETCH_REQ``. s/``FETCH_REQ``/``FETCH_REQ``\s/ (escape s after FETCH_REQ). > + > + ======== ======== ======== ======== > + thread 0 thread 1 thread 2 thread 3 > + ======== ======== ======== ======== > + (0, 0) (0, 1) (0, 2) (0, 3) > + (1, 3) (1, 0) (1, 1) (1, 2) Add table border in the bottom, i.e.: ---- >8 ---- diff --git a/Documentation/block/ublk.rst b/Documentation/block/ublk.rst index e9cbabdd69c553..dc6fdfedba9ab4 100644 --- a/Documentation/block/ublk.rst +++ b/Documentation/block/ublk.rst @@ -362,6 +362,7 @@ With these changes, a ublk server can balance load as follows: ======== ======== ======== ======== (0, 0) (0, 1) (0, 2) (0, 3) (1, 3) (1, 0) (1, 1) (1, 2) + ======== ======== ======== ======== With this setup, I/O submitted on a CPU which maps to queue 0 will be balanced across all threads instead of all landing on the same thread. Thanks. -- An old man doll... just what I always wanted! - Clara
signature.asc
Description: PGP signature