Hi.

Experimenting with SPEC NFS benchmark I've noticed significant amount of
CPU time, spent by ZFS prefetcher code. The most interesting is that
time was spent when prefercher was called not for data access, but by
dnode_hold()->dbuf_read() when accessing blocks of DMU_META_DNODE(os).
Further investigation shown that prefetcher probably misinterprets
random access to very large number of files (DMU_META_DNODE records) as
strided in some odd ways, that generates many pointless prefetches.

Source of that behavior I see in excessive softness in strided streams
handling in prefetcher.  Sequential streams require each following block
to be read to move further.  Strided stream same time is much softer --
it count any request within zst_len range of the next stride as hit,
triggering another large portion of prefetch. As result, if some random
prefetch stream manages to become strided (that requires only thee
luckily placed requests), it already has higher chances to live longer,
and if after that it accumulates some zst_len -- it may jump forward and
backward generating random prefetches for a very long time.

I've made a simple patch
(https://people.freebsd.org/~mav/strict_stride.patch) that gives strided
prefetch same strict limitations as sequential one has. That completely
fixed the problem described above, while strided patterns generated by
iozone are still working as such. My only suspicion is that may be
present algorithm was designed for some more odd access pattern, that
allows reads within stride to be done not sequentially, but I can not
imagine in which applications such pattern could be useful enough to
tune for it.

Am I missing something?

-- 
Alexander Motin
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to