On Wed, Aug 14, 2019 at 12:32:44PM +0200, Martijn Coenen wrote:
> Since Android Q, the creation and configuration of loop devices is in
> the critical path of device boot. We found that the configuration of
> loop devices is pretty slow, because many ioctl()'s involve freezing the
> block queue, which in turn needs to wait for an RCU grace period. On
> Android devices we've observed up to 60ms for the creation and
> configuration of a single loop device; as we anticipate creating many
> more in the future, we'd like to avoid this delay.
> 

Another candidate is to not switch to q_usage_counter's percpu mode
until loop becomes Lo_bound, and this way may be more clean.

Something like the following patch:

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index a7461f482467..8791f9242583 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1015,6 +1015,9 @@ static int loop_set_fd(struct loop_device *lo, fmode_t 
mode,
         */
        bdgrab(bdev);
        mutex_unlock(&loop_ctl_mutex);
+
+       percpu_ref_switch_to_percpu(&lo->lo_queue->q_usage_counter);
+
        if (partscan)
                loop_reread_partitions(lo, bdev);
        if (claimed_bdev)
@@ -1171,6 +1174,8 @@ static int __loop_clr_fd(struct loop_device *lo, bool 
release)
        lo->lo_state = Lo_unbound;
        mutex_unlock(&loop_ctl_mutex);
 
+       percpu_ref_switch_to_atomic(&lo->lo_queue->q_usage_counter, NULL);
+
        /*
         * Need not hold loop_ctl_mutex to fput backing file.
         * Calling fput holding loop_ctl_mutex triggers a circular
@@ -2003,6 +2008,12 @@ static int loop_add(struct loop_device **l, int i)
        }
        lo->lo_queue->queuedata = lo;
 
+       /*
+        * cheat block layer for not switching to q_usage_counter's
+        * percpu mode before loop becomes Lo_bound
+        */
+       blk_queue_flag_set(QUEUE_FLAG_INIT_DONE, lo->lo_queue);
+
        blk_queue_max_hw_sectors(lo->lo_queue, BLK_DEF_MAX_SECTORS);
 
        /*


thanks,
Ming

Reply via email to