On 4/9/26 6:30 PM, John Garry wrote:
On 09/04/2026 07:37, Nilay Shroff wrote:

You mean a common blktests testcase, right?

For NVMe, that test would:
a. try to remove NVMe ko when we have the delayed removal active
b. ensure that we can queue for no path

I suppose that a common testcase could be possible (with dm mpath), but doesn't 
dm have its own testsuite?

Yes, I'd add a blktest for 'queue_if_no_path' feature. But as we know we have
separate test suite for dm under blktests, I'd first target nvme testcase and
then later add another testcase for dm-multipath.

Testing a. is a challenge to be effective, as we would typically not be able to 
remove the nvme modules anyway due to many other references.

For b, how about something like the following:

set_conditions() {
     _set_nvme_trtype "$@"
}

_delayed_nvme_reconnect_ctrl() {
     sleep 2
     _nvme_connect_subsys
}

test() {
     echo "Running ${TEST_NAME}"

     _setup_nvmet

     local nvmedev
     local ns
     local bytes_written

     _nvmet_target_setup
     _nvme_connect_subsys

     # Part a: Ensure writes fail when no path returns
     nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
     ns=$(_find_nvme_ns "${def_subsys_uuid}")
     echo 10 > "/sys/block/"$ns"/delayed_removal_secs"
     bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096)
     if [ "$bytes_written" != 4096 ]; then
         echo "could not write successfully initially"
     fi
     sleep 1
     _nvme_disconnect_ctrl "${nvmedev}"
     sleep 1
     ns=$(_find_nvme_ns "${def_subsys_uuid}")
     if [[ "${ns}" = "" ]]; then
         echo "could not find ns after disconnect"
     fi
     bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096)
     if [ "$bytes_written" == 4096 ]; then
         echo "wrote successfully after disconnect"
     fi
     sleep 10
     ns=$(_find_nvme_ns "${def_subsys_uuid}")
     if [[ !"${ns}" = "" ]]; then
         echo "found ns after delayed removal"
     fi

     #echo "now part 2"
     # Part b: Ensure writes work for intermittent disconnect
     _nvme_connect_subsys

     nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
     ns=$(_find_nvme_ns "${def_subsys_uuid}")
     echo 10 > "/sys/block/"$ns"/delayed_removal_secs"
     bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096)
     if [ "$bytes_written" != 4096 ]; then
         echo "could not write successfully initially"
     fi
     sleep 1
     _nvme_disconnect_ctrl "${nvmedev}"
     sleep 1
     ns=$(_find_nvme_ns "${def_subsys_uuid}")
     if [[ "${ns}" = "" ]]; then
         echo "could not find ns after disconnect"
     fi
     _delayed_nvme_reconnect_ctrl &
     sleep 1
     bytes_written=$(run_xfs_io_pwritev2 /dev/"$ns" 4096)
     if [ "$bytes_written" != 4096 ]; then
         echo "could not write successfully with reconnect"
     fi

It seems there may be a race here if we attempt to write to $ns before
the reconnect has completed in _delayed_nvme_reconnect_ctrl.

If the intention is simply to verify that the controller reconnect occurs
within the delayed removal window and test pwrite, then it may be sufficient
to:
- perform the reconnect, and
- then validate the write (pwrite) afterwards.

In that case, we could either:
- run _delayed_nvme_reconnect_ctrl in the foreground, or
- open-code the reconnect directly in the script before issuing the write.

     sleep 10
     ns=$(_find_nvme_ns "${def_subsys_uuid}")
     if [[ "${ns}" = "" ]]; then
         echo "could not find ns after delayed reconnect"
     fi

     # Final tidy-up
     echo 0 > /sys/block/"$ns"/delayed_removal_secs
     nvmedev=$(_find_nvme_dev "${def_subsysnqn}")
     _nvme_disconnect_ctrl "${nvmedev}"
     _nvmet_target_cleanup

     echo "Test complete"
}

Otherwise overall this looks good to me.

Thanks,
--Nilay

Reply via email to