On 2025/3/26 19:04, Mikulas Patocka wrote:


On Wed, 26 Mar 2025, LongPing Wei wrote:

Calling verity_verify_io in bh for IO of all sizes is not suitable for
embedded devices. From our tests, it can improve the performance of 4K
synchronise random reads.
For example:
./fio --name=rand_read --ioengine=psync --rw=randread --bs=4K \
  --direct=1 --numjobs=8 --runtime=60 --time_based --group_reporting \
  --filename=/dev/block/mapper/xx-verity

But it will degrade the performance of 512K synchronise sequential reads
on our devices.
For example:
./fio --name=read --ioengine=psync --rw=read --bs=512K --direct=1 \
  --numjobs=8 --runtime=60 --time_based --group_reporting \
  --filename=/dev/block/mapper/xx-verity

A parameter array is introduced by this change. And users can modify the
default config by /sys/module/dm_verity/parameters/use_bh_bytes.

The default limits for NONE/RT/BE/IDLE is set to 4096.

Call verity_verify_io directly when verity_end_io is in softirq.

Signed-off-by: LongPing Wei <weilongp...@oppo.com>

Are you sure that 4096 bytes is the correct threshold?

I suggest that you run the benchmarks for 4k, 8k, 16k, 32k, 64k, 128k,
256k, 512k and set the default threshold to the largest value where bh
code performs better than non-bh code.

Mikulas


Hi, Mikulas

My test device is a smartphone based on SnapDragon 6gen1 with 512GB
UFS3.1 and 12GiB DDR.

./fio --name=rand_read --ioengine=psync --rw=randread --bs=4K \
   --direct=1 --numjobs=8 --runtime=60 --time_based --group_reporting \
   --filename=/dev/block/mapper/xx-verity

origin: 165~MiB/s
after: 215~ MiB/s

./fio --name=rand_read --ioengine=psync --rw=randread --bs=8K \
   --direct=1 --numjobs=8 --runtime=60 --time_based --group_reporting \
   --filename=/dev/block/mapper/xx-verity
origin: 265~ MiB/s
after: 132,268,308,302,116  avg:225.2MiB/s

Just like what I have explained to Eric:

> On the low-end device I tested locally, soft interrupts would be
> concentrated on CPU4~7. I tried to increase use_bh_bytes but got
> negative effects. The time cost of a single block softirq becomes
> higher, which will defer subsequent block softirq.
>
> But on the devices with UFS MCQ or NVME SSD, most verity_end_io should
> be called in hardirq context directly but not softirq. These devices
> have the potential to raise the threshold.

I cannot run the benchmarks on the high-end devices with UFS MCQ now
as I don't have a test device with 6.12 kernel. The existing Android
products will stay on the LTS version they are launched on until that
version is EOL. I will share you the results when I get a new product
with 6.12 and UFS MCQ and run the benckmarks in the future.

I would prefer 4096 to be the default threshold as it won't introduce
negative effects on all devices including low-end devices like mine.

LongPing

Reply via email to