On 30/03/2014 02:33, Matthew Ahrens wrote:
> 
> 
> 
> On Mon, Mar 3, 2014 at 1:17 AM, Andriy Gapon <[email protected]
> <mailto:[email protected]>> wrote:
> 
> 
>     I noticed that on some of our systems we were getting a clearly abnormal 
> number
>     of l2arc checksum errors accounted in l2_cksum_bad.  The hardware 
> appeared to be
>     in good health.  Using DTrace I noticed that the data seemed to be 
> overwritten
>     with other data.  After more DTrace analysis I observed that sometimes
>     l2arc_write_buffers() would advance l2ad_hand by more than target_sz.
>     This meant that l2arc_write_buffers() would write beyond a region cleared 
> by
>     l2arc_evict() and thus overwrite data belonging to non-evicted buffers.  
> Havoc
>     ensues.
> 
>     The cache devices in question are all SSDs with logical sector size of 
> 4KB.
>     I am not sure about other ZFS platforms, but on FreeBSD this fact is 
> detected
>     and ashift of 12 is used for the cache vdevs.
> 
>     Looking at l2arc_write_buffers() code you can see that it properly 
> accounts for
>     a vdev ashift when actually writing buffers and advancing l2ad_hand:
>                             /*
>                              * Keep the clock hand suitably device-aligned.
>                              */
>                             buf_p_sz = vdev_psize_to_asize(dev->l2ad_vdev, 
> buf_sz);
>                             write_psize += buf_p_sz;
>                             dev->l2ad_hand += buf_p_sz;
> 
>     But the same is not done when selecting buffers to be written and 
> checking that
>     target_sz is not exceeded.
>     So, if ARC contains a lot of buffers smaller than 4K that means that 
> on-disk
>     size of the L2ARC buffers could be quite larger than their logical size.
> 
>     I propose the following patch which has been tested and seems to fix the 
> problem
>     without introducing any side effects:
>     
> https://github.com/avg-I/freebsd/compare/review;l2arc-write-target-size.diff
>     https://github.com/avg-I/freebsd/compare/review;l2arc-write-target-size
> 
> 
> Your analysis and fix look correct to me.
> 
> I think you have misnamed "buf_p_sz".  It is the asize (Allocation Size), not
> the psize.  Thus a more appropriate name would be "buf_asize" or "buf_a_sz".
>  Likewise, you should create a new variable "write_asize" to use instead of
> "write_psize" here.  (looks like you've copied the existing incorrect name 
> from
> line 5133; that should be renamed too)
>  

Sorry for taking so long to get back.
I looked at the code and it seems that there is a potential for a small
confusion with "asize" there.  In some cases it means "allocation size" as you
point out, but in other cases it seems to stand for something like "actual
size".  For example, b_asize of l2arc_buf_hdr_t stores a size of b_tmp_cdata,
which could be equal to b_size if the buffer is not compressed or to some
smaller value otherwise.
Also, arcstat_l2_size accounts logical size of data in L2ARC, arcstat_l2_asize
accounts "actual" size of data adjusted for compression  -- in a sense, should
this be called physical size instead?  No kstat currently accounts for
allocated / on-disk size of data, which could be greater than arcstat_l2_asize
because of rounding up to a sector size.

Could you please clarify the canonical nomenclature of size, psize and asize?
I will the adjust the code to have proper variable names.
Perhaps, arcstat_l2_asize should actually account allocated size as well?  I
think that space usage for an L2 device should definitely reflect allocated
size.  That is, integral of vdev_psize_to_asize(b_asize) as opposed to just 
b_asize.

Thank you!
-- 
Andriy Gapon
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to