On Wed, Mar 26, 2025 at 11:54:16AM -0600, Uday Shankar wrote: > On Wed, Mar 26, 2025 at 01:38:35PM +0800, Ming Lei wrote: > > On Tue, Mar 25, 2025 at 04:19:34PM -0600, Uday Shankar wrote: > > > There are currently two ways in which ublk server exit is detected by > > > ublk_drv: > > > > > > 1. uring_cmd cancellation. If there are any outstanding uring_cmds which > > > have not been completed to the ublk server when it exits, io_uring > > > calls the uring_cmd callback with a special cancellation flag as the > > > issuing task is exiting. > > > 2. I/O timeout. This is needed in addition to the above to handle the > > > "saturated queue" case, when all I/Os for a given queue are in the > > > ublk server, and therefore there are no outstanding uring_cmds to > > > cancel when the ublk server exits. > > > > > > The second method detects ublk server exit only after a long delay > > > (~30s, the default timeout assigned by the block layer). Any > > > applications using the ublk device will be left hanging for these 30s > > > before seeing an error/knowing anything went wrong. This problem is > > > illustrated by running the new test_generic_02 against a ublk_drv which > > > doesn't have the fix: > > > > > > selftests: ublk: test_generic_02.sh > > > dev id is 0 > > > dd: error writing '/dev/ublkb0': Input/output error > > > 1+0 records in > > > 0+0 records out > > > 0 bytes copied, 30.0611 s, 0.0 kB/s > > > DEAD > > > dd took 31 seconds to exit (>= 5s tolerance)! > > > generic_02 : [FAIL] > > > > > > Fix this by instead handling the saturated queue case in the ublk > > > character file release callback. This happens during ublk server exit > > > and handles the issue much more quickly than an I/O timeout: > > > > Another solution is to override default 30sec 'timeout'. > > Yes, but that still will introduce unnecessary delays, since it is a > polling-based solution (very similar to monitor_work we used to have). > Also it will add complexity to the unprivileged case, since that > actually cares about timeout and we will have to track the "real" > timeout separately. > > > > > > > > > selftests: ublk: test_generic_02.sh > > > dev id is 0 > > > dd: error writing '/dev/ublkb0': Input/output error > > > 1+0 records in > > > 0+0 records out > > > 0 bytes copied, 0.0376731 s, 0.0 kB/s > > > DEAD > > > generic_02 : [PASS] > > > > > > Signed-off-by: Uday Shankar <ushan...@purestorage.com> > > > --- > > > drivers/block/ublk_drv.c | 40 > > > +++++++++++------------ > > > tools/testing/selftests/ublk/Makefile | 1 + > > > tools/testing/selftests/ublk/kublk.c | 3 ++ > > > tools/testing/selftests/ublk/kublk.h | 3 ++ > > > tools/testing/selftests/ublk/null.c | 4 +++ > > > tools/testing/selftests/ublk/test_generic_02.sh | 43 > > > +++++++++++++++++++++++++ > > > 6 files changed, 72 insertions(+), 22 deletions(-) > > > > > > diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c > > > index > > > c060da409ed8a888b7e414c9065efd2cbd6d57d7..1816b2cac01056dc9d01455759594af43c5f78d6 > > > 100644 > > > --- a/drivers/block/ublk_drv.c > > > +++ b/drivers/block/ublk_drv.c > > > @@ -1247,8 +1247,6 @@ static void ublk_queue_cmd(struct ublk_queue *ubq, > > > struct request *rq) > > > static enum blk_eh_timer_return ublk_timeout(struct request *rq) > > > { > > > struct ublk_queue *ubq = rq->mq_hctx->driver_data; > > > - unsigned int nr_inflight = 0; > > > - int i; > > > > > > if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) { > > > if (!ubq->timeout) { > > > @@ -1259,26 +1257,6 @@ static enum blk_eh_timer_return > > > ublk_timeout(struct request *rq) > > > return BLK_EH_DONE; > > > } > > > > > > - if (!ubq_daemon_is_dying(ubq)) > > > - return BLK_EH_RESET_TIMER; > > > - > > > - for (i = 0; i < ubq->q_depth; i++) { > > > - struct ublk_io *io = &ubq->ios[i]; > > > - > > > - if (!(io->flags & UBLK_IO_FLAG_ACTIVE)) > > > - nr_inflight++; > > > - } > > > - > > > - /* cancelable uring_cmd can't help us if all commands are in-flight */ > > > - if (nr_inflight == ubq->q_depth) { > > > - struct ublk_device *ub = ubq->dev; > > > - > > > - if (ublk_abort_requests(ub, ubq)) { > > > - schedule_work(&ub->nosrv_work); > > > - } > > > - return BLK_EH_DONE; > > > - } > > > - > > > return BLK_EH_RESET_TIMER; > > > } > > > > > > @@ -1351,6 +1329,24 @@ static int ublk_ch_open(struct inode *inode, > > > struct file *filp) > > > static int ublk_ch_release(struct inode *inode, struct file *filp) > > > { > > > struct ublk_device *ub = filp->private_data; > > > + bool need_schedule = false; > > > + int i; > > > + > > > + /* > > > + * Error out any requests outstanding to the ublk server. This > > > + * may have happened already (via uring_cmd cancellation), in > > > + * which case it is not harmful to repeat. But uring_cmd > > > + * cancellation does not handle queues which are fully saturated > > > + * (all requests in ublk server), because from the kernel's POV, > > > + * there are no outstanding uring_cmds to cancel. This code > > > + * handles such queues. > > > + */ > > > + > > > + for (i = 0; i < ub->dev_info.nr_hw_queues; i++) > > > + need_schedule |= ublk_abort_requests(ub, ublk_get_queue(ub, i)); > > > + > > > + if (need_schedule) > > > + schedule_work(&ub->nosrv_work); > > > > ublk_abort_requests() should be called only in case of queue dying, > > since ublk server may open & close the char device multiple times. > > Sure that is technically possible, however is any real ublk server doing > this? Seems like a strange thing to do, and seems reasonable for the > driver to transition the device to the nosrv state (dead or recovery, > depending on flags) when the char device is closed, since in this case, > no one can be handling I/O anymore.
ublk server should be free to open & close the char device multiple times, but you patch limits ublk server to open & close the char device just once. The limit looks too strong... > > In general I feel like char device close is a nice place to centralize > the transition to the nosrv state. It has a few nice properties: > - Because all file references are released at this point, we're > guaranteed that all file-related activity (uring_cmds, pread/pwrite) > is quiesced. > - This one place can handle both saturated and unsaturated queues. > - It is "event-driven," i.e. our callback gets called when a certain > condition is met, instead of having to poll for a condition (like the > old monitor_work, or the timeout now) > - It looks like we can sleep in the char device close context, so we > could inline nosrv_work. I agree all above, unless: 1) open() / close() need to be allowed multiple times 2) for dealing with 1), you may have to check if queue is dying, and this way may have to use ->ubq_daemon, which is set when starting ublk, and cleared when freeing ublk char. So race is added here, which need to be addressed. > > This also is a step in the right direction IMO for resurrecting this old > work to get rid of 1:1 ublk server thread to hctx restriction > > https://lore.kernel.org/linux-block/20241002224437.3088981-1-ushan...@purestorage.com/T/#u That is definitely one good direction. > > > For understanding if queue is dying, ->ubq_damon need to be checked, > > however it may not be set yet and the current context is not same with > > the ubq_daemon context, so I feel it is a bit fragile to bring queue > > reference into ->release() callback. > > > > Many libublksrv tests are failed with this patch or kernel panic, even > > with the above check added: > > > > make test T=generic > > Thanks, I will look at and address these failures. > > Is there any plan to bring these tests into the new ublk selftests > framework? The two stress tests should be very similar with ublksrv's, just MQ isn't enabled. I will enable them later. The other big part is recovery test, which may take some time. I am a little busy recently, it is great if anyone would like to pull recovery test in. Otherwise, it may take a while. thanks, Ming