On Mon, Mar 3, 2014 at 1:17 AM, Andriy Gapon <[email protected]> wrote:

>
> I noticed that on some of our systems we were getting a clearly abnormal
> number
> of l2arc checksum errors accounted in l2_cksum_bad.  The hardware appeared
> to be
> in good health.  Using DTrace I noticed that the data seemed to be
> overwritten
> with other data.  After more DTrace analysis I observed that sometimes
> l2arc_write_buffers() would advance l2ad_hand by more than target_sz.
> This meant that l2arc_write_buffers() would write beyond a region cleared
> by
> l2arc_evict() and thus overwrite data belonging to non-evicted buffers.
>  Havoc
> ensues.
>
> The cache devices in question are all SSDs with logical sector size of 4KB.
> I am not sure about other ZFS platforms, but on FreeBSD this fact is
> detected
> and ashift of 12 is used for the cache vdevs.
>
> Looking at l2arc_write_buffers() code you can see that it properly
> accounts for
> a vdev ashift when actually writing buffers and advancing l2ad_hand:
>                         /*
>                          * Keep the clock hand suitably device-aligned.
>                          */
>                         buf_p_sz = vdev_psize_to_asize(dev->l2ad_vdev,
> buf_sz);
>                         write_psize += buf_p_sz;
>                         dev->l2ad_hand += buf_p_sz;
>
> But the same is not done when selecting buffers to be written and checking
> that
> target_sz is not exceeded.
> So, if ARC contains a lot of buffers smaller than 4K that means that
> on-disk
> size of the L2ARC buffers could be quite larger than their logical size.
>
> I propose the following patch which has been tested and seems to fix the
> problem
> without introducing any side effects:
>
> https://github.com/avg-I/freebsd/compare/review;l2arc-write-target-size.diff
> https://github.com/avg-I/freebsd/compare/review;l2arc-write-target-size


Your analysis and fix look correct to me.

I think you have misnamed "buf_p_sz".  It is the asize (Allocation Size),
not the psize.  Thus a more appropriate name would be "buf_asize" or
"buf_a_sz".  Likewise, you should create a new variable "write_asize" to
use instead of "write_psize" here.  (looks like you've copied the existing
incorrect name from line 5133; that should be renamed too)


>
>
> I think that the following FreeBSD changes might also be of interest to
> OpenZFS
> community:
> https://github.com/freebsd/freebsd/commit/bd47afb2893#diff-23
> https://github.com/freebsd/freebsd/commit/1c55b38ae


Those also look reasonable to me.

--matt


>
> --
> Andriy Gapon
>
> _______________________________________________
> developer mailing list
> [email protected]
> http://lists.open-zfs.org/mailman/listinfo/developer
>
>
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to