All my OSS's are mounted on SSD. So, I think that wouldn't be the reason behind such improvements.
Thanks, Hasan On Sun, Oct 17, 2021 at 3:54 AM Andreas Dilger <[email protected]> wrote: > It seems likely if you are using spinning disks (HDD) that using > max_rpcs_in_flight=1 will reduce the seeking on the OSTs, especially with > large RPC sizes (16MB). > > Cheers, Andreas > > On Oct 16, 2021, at 16:11, Md Hasanur Rashid <[email protected]> wrote: > > > Hi Everyone, > > Thanks for getting back to me, Andreas and Patrick. The reason for me to > experiment with max_rpcs_in_flight being 1 is that I have observed > significant improvement over the default value when the workload is the > following one: > > define fileset > name="testF",entries=100,filesize=16m,prealloc,path="/mnt/hasanfs/tmp1" > > define process name="readerP",instances=2 { > thread name="readerT",instances=4 { > flowop openfile name="openOP",filesetname="testF" > flowop writewholefile name="writeOP",iters=4,filesetname="testF" > flowop readwholefile name="readOP",iters=1,filesetname="testF" > flowop closefile name="closeOP" > } > } > > create files > system "sync" > system "echo 3 > /proc/sys/vm/drop_caches" > > run 60 > > The performance when max_rpcs_in_flight has default value looks like the > following: > > *max_rpcs_in_flight=8, max_pages_per_rpc=256, max_dirty_mb=2000* > > > 64.164: Per-Operation Breakdown > > closeOP 5612ops 94ops/s 0.0mb/s 0.581ms/op > [0.278ms - 1.966ms] > > readOP 5612ops 94ops/s 5985.6mb/s 19.335ms/op > [14.957ms - 83.853ms] > > writeOP 22462ops 374ops/s 5989.3mb/s 16.248ms/op > [5.683ms - 626.451ms] > > openOP 5620ops 94ops/s 0.0mb/s 0.539ms/op > [0.313ms - 27.266ms] > > 64.164: IO Summary: 39306 ops 655.039 ops/s 94/374 rd/wr 11974.9mb/s > 12.206ms/op > > If we change the max_rpcs_in_flight value to 1, the performance is > following: > > *max_rpcs_in_flight=1, max_pages_per_rpc=256, max_dirty_mb=2000* > > 64.048: Per-Operation Breakdown > > closeOP 8631ops 144ops/s 0.0mb/s 0.593ms/op > [0.273ms - 4.810ms] > > readOP 8631ops 144ops/s 9205.6mb/s 20.187ms/op > [16.292ms - 37.513ms] > > writeOP 34545ops 576ops/s 9211.2mb/s 8.561ms/op > [5.860ms - 55.334ms] > > openOP 8639ops 144ops/s 0.0mb/s 0.552ms/op > [0.293ms - 10.132ms] > > 64.048: IO Summary: 60446 ops 1007.344 ops/s 144/576 rd/wr 18416.8mb/s > 7.939ms/op > > As you can see from above the performance got improved when the value got > changed to 1. Unfortunately, I am still not sure why such improvement is > being observed. I ran the workload from a single client. The cluster setup > looks like the following (5 server machines, 1 for MGS+MDS, and the other 4 > for OSSs where each one has 2 OSTs): > > UUID bytes Used Available Use% Mounted on > > hasanfs-MDT0000_UUID 144.8G 78.1M 134.8G 0% > /mnt/hasanfs[MDT:0] > > hasanfs-OST0000_UUID 197.3G 1.1G 185.4G 1% > /mnt/hasanfs[OST:0] > > hasanfs-OST0001_UUID 197.3G 1.2G 185.3G 1% > /mnt/hasanfs[OST:1] > > hasanfs-OST0002_UUID 197.3G 1.2G 185.3G 1% > /mnt/hasanfs[OST:2] > > hasanfs-OST0003_UUID 197.3G 1.2G 185.2G 1% > /mnt/hasanfs[OST:3] > > hasanfs-OST0004_UUID 197.3G 1.2G 185.3G 1% > /mnt/hasanfs[OST:4] > > hasanfs-OST0005_UUID 197.3G 1.2G 185.4G 1% > /mnt/hasanfs[OST:5] > > hasanfs-OST0006_UUID 197.3G 1.2G 185.3G 1% > /mnt/hasanfs[OST:6] > > hasanfs-OST0007_UUID 197.3G 1.2G 185.4G 1% > /mnt/hasanfs[OST:7] > > > filesystem summary: 1.5T 9.6G 1.4T 1% /mnt/hasanfs > > It would be really helpful for me and highly appreciated if you can share > your intuitions or reasons regarding why such improvement may have been > observed. > > Thanks, > Md. Hasanur Rashid > > On Thu, Oct 7, 2021 at 5:45 PM Andreas Dilger <[email protected]> > wrote: > >> [*Caution*: Email from External Sender. Do not click or open links or >> attachments unless you know this sender.] >> On Oct 7, 2021, at 13:19, Md Hasanur Rashid via lustre-discuss < >> [email protected]> wrote: >> >> >> Hello Everyone, >> >> I am running the Filebench benchmark in my Lustre cluster. I set the >> max_rpcs_in_flight value to be 1. Before executing and after executing, I >> verified that the value of max_rpcs_in_flight is indeed 1. However, when I >> check the rpc_stats, the stat shows a much higher value for RPCs in flight. >> >> >> If by "much higher than 1" you mean "2", then yes it appears there are >> mostly (95%) 2 RPCs being processed concurrently on this OST. >> >> That might happen if you have 2 clients/mountpoints writing to the same >> OST, it might be an off-by-one logic error allowing an extra RPC in flight, >> it might be intentional for some reason (e.g. to avoid deadlock, memory >> pressure, etc), or it might be accounting error in the statistics (e.g. >> counting the next RPC to be sent before the first one is marked finished). >> >> Following is the value shown for one OSC just for reference: >> >> osc.hasanfs-OST0000-osc-ffff882fcf777000.rpc_stats= >> snapshot_time: 1632483604.967291 (secs.usecs) >> read RPCs in flight: 0 >> write RPCs in flight: 0 >> pending write pages: 0 >> pending read pages: 0 >> >> read write >> pages per rpc rpcs % cum % | rpcs % cum % >> 1: 1 100 100 | 0 0 0 >> 2: 0 0 100 | 0 0 0 >> 4: 0 0 100 | 0 0 0 >> 8: 0 0 100 | 0 0 0 >> 16: 0 0 100 | 0 0 0 >> 32: 0 0 100 | 0 0 0 >> 64: 0 0 100 | 0 0 0 >> 128: 0 0 100 | 0 0 0 >> 256: 0 0 100 | 9508 100 100 >> >> read write >> rpcs in flight rpcs % cum % | rpcs % cum % >> 0: 0 0 0 | 0 0 0 >> 1: 1 100 100 | 10 0 0 >> 2: 0 0 100 | 9033 95 95 >> 3: 0 0 100 | 465 4 100 >> >> read write >> offset rpcs % cum % | rpcs % cum % >> 0: 1 100 100 | 725 7 7 >> 1: 0 0 100 | 0 0 7 >> 2: 0 0 100 | 0 0 7 >> 4: 0 0 100 | 0 0 7 >> 8: 0 0 100 | 0 0 7 >> 16: 0 0 100 | 0 0 7 >> 32: 0 0 100 | 0 0 7 >> 64: 0 0 100 | 0 0 7 >> 128: 0 0 100 | 0 0 7 >> 256: 0 0 100 | 718 7 15 >> 512: 0 0 100 | 1386 14 29 >> 1024: 0 0 100 | 2205 23 52 >> 2048: 0 0 100 | 1429 15 67 >> 4096: 0 0 100 | 1103 11 79 >> 8192: 0 0 100 | 1942 20 100 >> >> Can anyone please explain to me why the RPCs in flight shown in the >> rpc_stats could be higher than the max_rpcs_in_flight? >> >> >> I do see a similar behavior with the statistics of my home system, which >> has the default osc.*.max_rpcs_in_flight=8, but shows many cases of 9 RPCs >> in flight in the statistics for both read and write, and in a few cases 10 >> or 11: >> >> read write >> rpcs in flight rpcs % cum % | rpcs % cum % >> 1: 121 2 2 | 27831 93 93 >> 2: 23 0 3 | 108 0 93 >> 3: 22 0 3 | 19 0 93 >> 4: 24 0 4 | 15 0 93 >> 5: 19 0 5 | 10 0 93 >> 6: 26 0 5 | 13 0 93 >> 7: 176 4 9 | 39 0 93 >> 8: 933 22 32 | 75 0 94 >> 9: 2802 67 99 | 1207 4 98 >> 10: 10 0 100 | 543 1 99 >> 11: 0 0 100 | 1 0 100 >> >> >> The good news is that Lustre is open source, so you can look into the >> lustre/osc code to see why this is happening. The limit is set by >> cli->cl_max_rpcs_in_flight, and the stats are accounted by >> cli->cl_write_rpc_hist. >> >> Out of curiosity, you don't say _why_ this off-by-one error is of >> interest? Definitely it seems like a bug that could be fixed, but it >> doesn't seem too critical to correct functionality. >> >> Cheers, Andreas >> -- >> Andreas Dilger >> Lustre Principal Architect >> Whamcloud >> >> >> >> >> >> >> >>
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
