On 2019-08-15 1:30 p.m., Bart Van Assche wrote:
On 8/13/19 9:19 PM, Douglas Gilbert wrote:
Bart Van Assche hinted at a better API design but didn't present
it. If he did, that would be the first time an alternate API
design was presented for async usage in the 20 years that I have
been associated with the driver.

I would like to start from the use cases instead of the implementation of a new SG/IO interface. My employer uses the SG/IO interface for controlling SMR and

There is no "new" SG/IO interface. Linux has broken the ability of char
drivers to safely use the read() and write() system calls. This
adversely impacts the bsg and sg drivers. In response the following
replacement mappings have been suggested in my first sg patchset:

1) For sg driver currently in production, its async interface:
       write(sg_fd, &sg_v3_obj, sizeof(sg_v3_obj))
         ----->  ioctl(sg_fd, SG_IOSUBMIT_V3, &sg_v3_obj)
       read(sg_fd, &sg_v3_obj, sizeof(sg_v3_obj))
         ----->  ioctl(sg_fd, SG_RECEIVE_V3, &sg_v3_obj)

   And send out a WARN_ONCE when write(sg_fd, &sg_v3_obj,...) is used.

2) For the async portion of the bsg driver that was removed last
   year, the following, slightly more complex mapping is proposed:
       write(bsg_fd, &sg_v4_obj, sizeof(sg_v4_obj))
         ----->  ioctl(sg_fd_equiv_bsg, SG_IOSUBMIT, &sg_v4_obj)
       read(bsg_fd, &sg_v4_obj, sizeof(sg_v4_obj))
         ----->  ioctl(sg_fd_equiv_bsg, SG_RECEIVE, &sg_v4_obj)

   The bsg_fd --> sg_fd_equiv_bsg mapping can be done with the help
   of sysfs.

There is another case with the bsg async interface where the third
argument to write() and read() is a multiple of the size of a sg_v4_obj.
I call that a multiple requests invocation. That is handled in my second
patchset with an extra level of indirection. Yes, that is a change in
the API, but it is more on the syntax side rather than the semantics side.

The ioctls have another advantage over the write()/read() interface.
The reader will notice the both SG_IOSUBMIT and SG_IORECEIVE are defined
with the _IOWR() macro indicating bi-directional dataflow. The "reverse"
direction dataflow for the submit side is when a tag is sent back from
the block layer. For the receive side the reverse flow is when matching
either by pack_id or tag.

Also some longstanding features of the sg async API such as
ioctl(SG_GET_NUM_WAITING) can lead to a reduction in API traffic. Say
we have 20 SCSI commands that don't depend on one another (e.g. READ
GATHERED). They could be submitted asynchronously with a single
multiple requests invocation by ioctl(SG_IOSUBMIT) with the flag
SGV4_FLAG_IMMED set. The user code could then wait for one (any one)
to finish and process it (so that is two API calls so far). Now an
ioctl(SG_GET_NUM_WAITING) could be issued and say it gets 3 then a
multiple requests invocation of ioctl(SG_IORECEIVE) for those 3
could be sent and complete promptly. Now the tally of API calls is
up to 4. If another ioctl(SG_GET_NUM_WAITING) was issued and say
it yielded 16 then a multiple requests invocation of
ioctl(SG_IORECEIVE) for those 16 would complete the originally
submitted 20 SCSI commands. The total tally of API calls is 6 with
only 1 of those waiting. The wait could be made fully async by
using a polling loop or a signal to replace that (and any other)

If the user space didn't mind blocking then the whole 20 SCSI commands
could be processed efficiently with a single multiple requests
invocation using ioctl(SG_IOSUBMIT) with the SGV4_FLAG_IMMED flag
cleared. It would first issue all 20 command then return after all 20
commands were complete. That is an extension of the removed bsg async
SCSI API, but a pretty good one IMO.

The sg driver's async model remains basically the same as when the
driver first appeared in 1992. Naturally there have been enhancements
along the way, such as that last example.

HSMR disks. What we need is the ability to discover, read, write and configure such disks, support for the non-standard HSMR flex protocol, the ability to give certain users or groups access to a subset of the LBAs and also the ability to make that information persistent. I think that such functionality could be implemented by extending LVM and by adding support for all ZBC commands we need in the block layer, device mapper layer and also in the asynchronous I/O layer. The block, dm and aio layers already support submitting commands asynchronously but do not yet support all the ZBC commands that we use.

I believe that you will find that the more layers of abstraction that are
placed between the actual device and the OS level API, the more difficult
the discovery process will be. And in some cases you will need to get to
a management layer to let those management functions "pass-through" those
layers. Some RAID card drivers take advantage of the no_uld_attach flag in
scsi_device to expose real devices, but only to the sg/bsg interface for
management purposes (for utilities like smartmontools) and do not produce
sd device nodes.

Are there any SG/IO use cases that have not yet been mentioned in this e-mail thread? If SMR and HSMR are the primary use cases for SG/IO, should asynchronous command support be added in the SG/IO layer or should rather ZBC support in the block, dm and aio layers be improved?

My guess is quite a few, and the companies involved don't want to talk about
their use publicly. For example when a big computer company starts reporting
errors, I believe my role is to try and fix the errors, not to interrogate
them about how and why they are using the driver. On the other hand, Tony
Battersby has been relatively active on this list and produced patches for
the sg driver over several years. Tony is well positioned to know the
driver's  strengths and weaknesses but has said that he has little time to
review these patchsets. I appreciate any feedback I can get from him.

Doug Gilbert

Reply via email to