Hi, At 2024-09-09 21:37:35, "Kent Overstreet" <[email protected]> wrote: >On Sat, Sep 07, 2024 at 06:34:37PM GMT, David Wang wrote:
>> >> Based on the result: >> 1. The row with prepare-write size 4K stands out, here. >> When files were prepaired with write size 4K, the afterwards >> read performance is worse. (I did double check the result, >> but it is possible that I miss some affecting factors.); > >On small blocksize tests you should be looking at IOPS, not MB/s. > >Prepare-write size is the column? Each row is for a specific prepare-write size indicated by first column. > >Another factor is that we do merge extents (including checksums); so if >the preparet-write is done sequentially we won't actually be ending up >with extents of the same size as what we wrote. > >I believe there's a knob somewhere to turn off extent merging (module >parameter? it's intended for debugging). I made some debug, when performance is bad, the conditions bvec_iter_sectors(iter) != pick.crc.uncompressed_size and bvec_iter_sectors(iter) != pick.crc.live_size are "almost" always both "true", while when performance is good (after "thorough" write), they are only little percent (~350 out of 1000000) to be true. And if those conditions are "true", "bounce" would be set and code seems to run on a time consuming path. I suspect "merely read" could never change those conditions, but "write" can? > >> 2. Without O_DIRECT, read performance seems correlated with the difference >> between read size and prepare write size, but with O_DIRECT, correlation is >> not obvious. > >So the O_DIRECT and buffered IO paths are very different (in every >filesystem) - you're looking at very different things. They are both >subject to the checksum granularity issue, but in buffered mode we round >up reads to extent size, when filling into the page cache. > >Big standard deviation (high tail latency?) is something we'd want to >track down. There's a bunch of time_stats in sysfs, but they're mostly >for the write paths. If you're trying to identify where the latencies >are coming from, we can look at adding some new time stats to isolate.
