That problem makes sense to me, and your proposed solution seems correct in
principle.  I'm not totally sure if there's anything else that relies on
the existing behavior in dbuf_read_impl().

--matt

On Tue, Nov 17, 2015 at 2:19 PM, Boris <bprotopo...@hotmail.com> wrote:

> So, I am thinking to try modifying dbuf_read_impl() to fill the arc buf
> with appropriate level holes instead of doing bzero() when
> reading metadata-level holes whose birth epoch is greater than zero (or
> greater than the than the hole_birth feature txg perhaps).
>
>
> Boris.
>
> ------------------------------
> *From:* Boris <bprotopo...@hotmail.com>
> *Sent:* Tuesday, November 17, 2015 3:00 PM
> *To:* Matthew Ahrens
> *Cc:* developer@open-zfs.org; zfs-de...@list.zfsonlinux.org
> *Subject:* Re: [OpenZFS Developer] zfs send not detecting new holes
>
>
> Hi, Matt,
>
>
> I believe I did reproduce the problem. The difficulty was really with
> creating an L1 hole. Which I managed with a zfs recv of an empty L1 range
> from one zvol to another. The target zvol had L1 hole in place of the L1
> range filled with L0 holes in the source zvol.
>
> The issue that I see is as follows (the datasets have compression on, the
> pool has hole_birth feature active).  If the L1 hole is later partially 
> overwritten
> with non-zero data, then the result is that a new L1 block is allocated and
> is partially filled in with new L0 block pointers pointing to non-zero
> blocks. Unfortunately, the rest of the L1 block appears to be left
> initialized with zeros (to zdb, it looks like bunch of holes with 0 birth
> epoch). But this is a wrong thing to do, because now, this hole at the
> end of the L1 range in question is "old" whereas it should retail the birth
> epoch of the original L1 hole ("new"). But it does not. So, the next zfs
> send disregards this hole, which results in lost FREE record(s) in the
> corresponding zfs send stream.
>
> I have the datasets snapped and local, I can reproduce this problem and
> can dump any zdb data if needed. Here are some snippets of the zdb output..
>
> Before the overwrite:
>
>           7c0000   L0 1:2584800:10000 10000L/10000P F=1 B=346/346
>
>           7d0000   L0 1:2594800:10000 10000L/10000P F=1 B=346/346
>
>           7e0000   L0 1:25a4800:10000 10000L/10000P F=1 B=346/346
>
>           7f0000   L0 1:25b4800:10000 10000L/10000P F=1 B=346/346
>
>           800000  L1  4000L B=548
>
>          1000000  L1  0:7c23e00:400 1:7c2d200:400 4000L/400P F=128
> B=1268/1268
>
>          1000000   L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268
>
>          1010000   L0 0:7818c00:10000 10000L/10000P F=1 B=1268/1268
>
>          1020000   L0 0:7828c00:10000 10000L/10000P F=1 B=1268/1268
>
> The L1 hole is at offset 800000. After the partial overwrite (10 blocks
> written at the beginning of the L1 range):
>
>           7d0000   L0 1:2594800:10000 10000L/10000P F=1 B=346/346
>
>           7e0000   L0 1:25a4800:10000 10000L/10000P F=1 B=346/346
>
>           7f0000   L0 1:25b4800:10000 10000L/10000P F=1 B=346/346
>
>           800000  L1  0:ea36000:600 1:f2d6000:600 4000L/600P F=10
> B=1749/1749
>
>           800000   L0 1:f246000:10000 10000L/10000P F=1 B=1749/1749
>
>           810000   L0 1:f256000:10000 10000L/10000P F=1 B=1749/1749
>
>           820000   L0 1:f266000:10000 10000L/10000P F=1 B=1749/1749
>
>           830000   L0 1:f276000:10000 10000L/10000P F=1 B=1749/1749
>
>           840000   L0 1:f286000:10000 10000L/10000P F=1 B=1749/1749
>
>           850000   L0 1:f296000:10000 10000L/10000P F=1 B=1749/1749
>
>           860000   L0 1:f2a6000:10000 10000L/10000P F=1 B=1749/1749
>
>           870000   L0 1:f2c6000:10000 10000L/10000P F=1 B=1749/1749
>
>           880000   L0 1:f2b6000:10000 10000L/10000P F=1 B=1749/1749
>
>           890000   L0 0:ea26000:10000 10000L/10000P F=1 B=1749/1749
>
>          1000000  L1  0:7c23e00:400 1:7c2d200:400 4000L/400P F=128
> B=1268/1268
>
>          1000000   L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268
>
> Dump of the new L1 block's contents:
>
> # zdb tpool -R 0:ea36000:600:di
>
> Found vdev: /dev/sdk1
>
> DVA[0]=<1:f246000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=2041f382f58d:408f994ef048de7:daf0e1cadf74f53:47c3fdb952a1e13f
>
> DVA[0]=<1:f256000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=20306f1938d5:403381ccd468d8a:713a193137858160:d91b1c5cecb306af
>
> DVA[0]=<1:f266000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=1fe160444ab9:3fcbaeb1f31c86e:11198655b490d3b5:76cd3d278385af3e
>
> DVA[0]=<1:f276000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=201cb0f386db:4035cd7deb41749:6b4d734a11ce04b5:1fbc2dc2f169dcae
>
> DVA[0]=<1:f286000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=202d87c1a695:403d7ff6a1a6f53:66b049fa47216fb4:848b133855fab5b
>
> DVA[0]=<1:f296000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=200db48ae914:40163392a6e1f2a:62ad5c6b01c39d36:b1fa1b14d986fa82
>
> DVA[0]=<1:f2a6000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=1feb72709d3a:3f8dbd7a3f1e98f:ab207f926cc8b2fc:c9a1145f06e1f9ab
>
> DVA[0]=<1:f2c6000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=1fd90de96ff0:3f98dc852f15900:a3cd2aed016bc0a9:eb9f507ffe495f15
>
> DVA[0]=<1:f2b6000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=20361039de8d:40b6d13e4438295:75686dbb7da50937:3217ceae84d5b538
>
> DVA[0]=<0:ea26000:10000> [L0 zvol object] fletcher4 uncompressed LE
> contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1
> cksum=202a397fecd2:405130f54a9a83d:cda024b471659627:edf740c0ca1563eb
>
> HOLE [L0 unallocated] size=200L birth=0L
>
> HOLE [L0 unallocated] size=200L birth=0L
>
> HOLE [L0 unallocated] size=200L birth=0L
>
> ...
>
> HOLE [L0 unallocated] size=200L birth=0L
>
> HOLE [L0 unallocated] size=200L birth=0L
>
> HOLE [L0 unallocated] size=200L birth=0L
>
> The uncompressed data is likely due to the /dev/urandom source. The volume
> does have lz4 compression set (and had before the overwrite - inherited
> from the pool).
>
> By induction, a similar issue is likely to arise with an Ln hole when it
> is partially overwritten with non-hole block pointers. The remainder of the
> new indirect block allocated in place of the Ln hole needs to be
> backfilled with Ln-1 holes with the same birth epoch as the original Ln
> hole.
>
> At this time, it is not clear to me how this is best accomplished.. Any
> pointers are highly appreciated.
>
>
> Best regards,
> Boris.
>
> ------------------------------
> *From:* Matthew Ahrens <mahr...@delphix.com>
> *Sent:* Monday, November 16, 2015 5:14 PM
> *To:* Boris
> *Cc:* zfs-de...@list.zfsonlinux.org; developer@open-zfs.org
> *Subject:* Re: [OpenZFS Developer] zfs send not detecting new holes
>
>
>
> On Mon, Nov 16, 2015 at 4:36 AM, Boris <bprotopo...@hotmail.com> wrote:
>
>> I should have been more specific, in my case I see the problem with
>> zvols: the first snapshot has a non-zero block, the next snapshot has the
>> block overwrite with zeros, but the stream lacks the free record. The zvol
>> is ~1.2T, 64k block size, sparse, has lz4 compression on.
>>
>
> In that case I don't think your problem is related to the bug I mentioned,
> which only has to do with objects that have been reallocated.  You must be
> seeing a different issue.  We also can not reproduce your issue with a
> simple test case.
>
> --matt
>
>
>>
>> Typos courtesy of my iPhone
>>
>> On Nov 15, 2015, at 12:25 PM, Matthew Ahrens <mahr...@delphix.com> wrote:
>>
>> btw, here is the bug you're asking about:
>> https://www.illumos.org/issues/6370
>>
>> --matt
>>
>> On Sun, Nov 15, 2015 at 9:24 AM, Matthew Ahrens <mahr...@delphix.com>
>> wrote:
>>
>>> We have a fix for this that we need to upstream.  We are waiting on code
>>> reviews for another change to send/receive:
>>>
>>> https://github.com/openzfs/openzfs/pull/23
>>> 6393 zfs receive a full send as a clone
>>>
>>> I'll probably stop waiting soon and RTI it, then we get get our fix for
>>> this in.
>>>
>>> --matt
>>>
>>> On Sun, Nov 15, 2015 at 8:37 AM, Boris <bprotopo...@hotmail.com> wrote:
>>>
>>>> Hi, guys,
>>>>
>>>>
>>>> I've been looking an issue where sometimes, after non-zero data blocks
>>>> are overwritten with zero blocks with compression on, the corresponding
>>>> incremental send stream does not include the FREE record for those blocks.
>>>> The zdb -ddddddd output seems to indicate that the blocks in question have
>>>> never been written (the offsets for them are not listed in the output).
>>>>
>>>>
>>>> This looks like the issue addressed by
>>>>
>>>>
>>>> commit a4069eef2e403a3b2a307b23b7500e2adc6ecae5
>>>>
>>>> Author: Prakash Surya <prakash.su...@delphix.com>
>>>>
>>>> Date:   Fri Mar 27 13:03:22 2015 +1100
>>>>
>>>>
>>>>     Illumos 5695 - dmu_sync'ed holes do not retain birth time
>>>>
>>>>
>>>> but I certainly do have that commit. I have experimented with
>>>> overwriting blocks at different offsets, ranges of blocks spanning L1 and
>>>> L2 block pointers, but I cannot reproduce the issue.
>>>>
>>>>
>>>> Any suggestions for directions to look ? Perhaps for a way to shape the
>>>> block tree such that this problem could arise ?
>>>>
>>>>
>>>> Best regards,
>>>>
>>>> Boris.
>>>>
>>>> _______________________________________________
>>>> developer mailing list
>>>> developer@open-zfs.org
>>>> http://lists.open-zfs.org/mailman/listinfo/developer
>>>>
>>>>
>>>
>>
>
_______________________________________________
developer mailing list
developer@open-zfs.org
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to