I noticed that on some of our systems we were getting a clearly abnormal number
of l2arc checksum errors accounted in l2_cksum_bad.  The hardware appeared to be
in good health.  Using DTrace I noticed that the data seemed to be overwritten
with other data.  After more DTrace analysis I observed that sometimes
l2arc_write_buffers() would advance l2ad_hand by more than target_sz.
This meant that l2arc_write_buffers() would write beyond a region cleared by
l2arc_evict() and thus overwrite data belonging to non-evicted buffers.  Havoc
ensues.

The cache devices in question are all SSDs with logical sector size of 4KB.
I am not sure about other ZFS platforms, but on FreeBSD this fact is detected
and ashift of 12 is used for the cache vdevs.

Looking at l2arc_write_buffers() code you can see that it properly accounts for
a vdev ashift when actually writing buffers and advancing l2ad_hand:
                        /*
                         * Keep the clock hand suitably device-aligned.
                         */
                        buf_p_sz = vdev_psize_to_asize(dev->l2ad_vdev, buf_sz);
                        write_psize += buf_p_sz;
                        dev->l2ad_hand += buf_p_sz;

But the same is not done when selecting buffers to be written and checking that
target_sz is not exceeded.
So, if ARC contains a lot of buffers smaller than 4K that means that on-disk
size of the L2ARC buffers could be quite larger than their logical size.

I propose the following patch which has been tested and seems to fix the problem
without introducing any side effects:
https://github.com/avg-I/freebsd/compare/review;l2arc-write-target-size.diff
https://github.com/avg-I/freebsd/compare/review;l2arc-write-target-size

I think that the following FreeBSD changes might also be of interest to OpenZFS
community:
https://github.com/freebsd/freebsd/commit/bd47afb2893#diff-23
https://github.com/freebsd/freebsd/commit/1c55b38ae
-- 
Andriy Gapon
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to