On Tue, Jun 24, 2025 at 11:29:28AM +0530, Kundan Kumar wrote: > On Wed, Jun 11, 2025 at 9:21 PM Darrick J. Wong <djw...@kernel.org> wrote: > > > > On Wed, Jun 04, 2025 at 02:52:34PM +0530, Kundan Kumar wrote: > > > > > > For xfs used this command: > > > > > > xfs_io -c "stat" /mnt/testfile > > > > > > And for ext4 used this: > > > > > > filefrag /mnt/testfile > > > > > > > > > > filefrag merges contiguous extents, and only counts up for > > > > > discontiguous > > > > > mappings, while fsxattr.nextents counts all extent even if they are > > > > > contiguous. So you probably want to use filefrag for both cases. > > > > > > > > Got it — thanks for the clarification. We'll switch to using filefrag > > > > and will share updated extent count numbers accordingly. > > > > > > Using filefrag, we recorded extent counts on xfs and ext4 at three > > > stages: > > > a. Just after a 1G random write, > > > b. After a 30-second wait, > > > c. After unmounting and remounting the filesystem, > > > > > > xfs > > > Base > > > a. 6251 b. 2526 c. 2526 > > > Parallel writeback > > > a. 6183 b. 2326 c. 2326 > > > > Interesting that the mapping record count goes down... > > > > I wonder, you said the xfs filesystem has 4 AGs and 12 cores, so I guess > > wb_ctx_arr[] is 12? I wonder, do you see a knee point in writeback > > throughput when the # of wb contexts exceeds the AG count? > > > > Though I guess for the (hopefully common) case of pure overwrites, we > > don't have to do any metadata updates so we wouldn't really hit a > > scaling limit due to ag count or log contention or whatever. Does that > > square with what you see? > > > > Hi Darrick, > > We analyzed AG count vs. number of writeback contexts to identify any > knee point. Earlier, wb_ctx_arr[] was fixed at 12; now we varied nr_wb_ctx > and measured the impact. > > We implemented a configurable number of writeback contexts to measure > throughput more easily. This feature will be exposed in the next series. > To configure, used: echo <nr_wb_ctx> > /sys/class/bdi/259:2/nwritebacks. > > In our test, writing 1G across 12 directories showed improved bandwidth up > to the number of allocation groups (AGs), mostly a knee point, but gains > tapered off beyond that. Also, we see a good increase in bandwidth of about > 16 times from base to nr_wb_ctx = 6. > > Base (single threaded) : 9799KiB/s > Parallel Writeback (nr_wb_ctx = 1) : 9727KiB/s > Parallel Writeback (nr_wb_ctx = 2) : 18.1MiB/s > Parallel Writeback (nr_wb_ctx = 3) : 46.4MiB/s > Parallel Writeback (nr_wb_ctx = 4) : 135MiB/s > Parallel Writeback (nr_wb_ctx = 5) : 160MiB/s > Parallel Writeback (nr_wb_ctx = 6) : 163MiB/s
Heh, nice! > Parallel Writeback (nr_wb_ctx = 7) : 162MiB/s > Parallel Writeback (nr_wb_ctx = 8) : 154MiB/s > Parallel Writeback (nr_wb_ctx = 9) : 152MiB/s > Parallel Writeback (nr_wb_ctx = 10) : 145MiB/s > Parallel Writeback (nr_wb_ctx = 11) : 145MiB/s > Parallel Writeback (nr_wb_ctx = 12) : 138MiB/s > > > System config > =========== > Number of CPUs = 12 > System RAM = 9G > For XFS number of AGs = 4 > Used NVMe SSD of 3.84 TB (Enterprise SSD PM1733a) > > Script > ===== > mkfs.xfs -f /dev/nvme0n1 > mount /dev/nvme0n1 /mnt > echo <num_wb_ctx> > /sys/class/bdi/259:2/nwritebacks > sync > echo 3 > /proc/sys/vm/drop_caches > > for i in {1..12}; do > mkdir -p /mnt/dir$i > done > > fio job_nvme.fio > > umount /mnt > echo 3 > /proc/sys/vm/drop_caches > sync > > fio job > ===== > [global] > bs=4k > iodepth=1 > rw=randwrite > ioengine=io_uring > nrfiles=12 > numjobs=1 # Each job writes to a different file > size=1g > direct=0 # Buffered I/O to trigger writeback > group_reporting=1 > create_on_open=1 > name=test > > [job1] > directory=/mnt/dir1 > > [job2] > directory=/mnt/dir2 > ... > ... > [job12] > directory=/mnt/dir1 > > > > ext4 > > > Base > > > a. 7080 b. 7080 c. 11 > > > Parallel writeback > > > a. 5961 b. 5961 c. 11 > > > > Hum, that's particularly ... interesting. I wonder what the mapping > > count behaviors are when you turn off delayed allocation? > > > > --D > > > > I attempted to disable delayed allocation by setting allocsize=4096 > during mount (mount -o allocsize=4096 /dev/pmem0 /mnt), but still > observed a reduction in file fragments after a delay. Is there something > I'm overlooking? Not that I know of. Maybe we should just take the win. :) --D > -Kundan > _______________________________________________ Linux-f2fs-devel mailing list Linux-f2fs-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel