That problem makes sense to me, and your proposed solution seems correct in principle. I'm not totally sure if there's anything else that relies on the existing behavior in dbuf_read_impl().
--matt On Tue, Nov 17, 2015 at 2:19 PM, Boris <bprotopo...@hotmail.com> wrote: > So, I am thinking to try modifying dbuf_read_impl() to fill the arc buf > with appropriate level holes instead of doing bzero() when > reading metadata-level holes whose birth epoch is greater than zero (or > greater than the than the hole_birth feature txg perhaps). > > > Boris. > > ------------------------------ > *From:* Boris <bprotopo...@hotmail.com> > *Sent:* Tuesday, November 17, 2015 3:00 PM > *To:* Matthew Ahrens > *Cc:* developer@open-zfs.org; zfs-de...@list.zfsonlinux.org > *Subject:* Re: [OpenZFS Developer] zfs send not detecting new holes > > > Hi, Matt, > > > I believe I did reproduce the problem. The difficulty was really with > creating an L1 hole. Which I managed with a zfs recv of an empty L1 range > from one zvol to another. The target zvol had L1 hole in place of the L1 > range filled with L0 holes in the source zvol. > > The issue that I see is as follows (the datasets have compression on, the > pool has hole_birth feature active). If the L1 hole is later partially > overwritten > with non-zero data, then the result is that a new L1 block is allocated and > is partially filled in with new L0 block pointers pointing to non-zero > blocks. Unfortunately, the rest of the L1 block appears to be left > initialized with zeros (to zdb, it looks like bunch of holes with 0 birth > epoch). But this is a wrong thing to do, because now, this hole at the > end of the L1 range in question is "old" whereas it should retail the birth > epoch of the original L1 hole ("new"). But it does not. So, the next zfs > send disregards this hole, which results in lost FREE record(s) in the > corresponding zfs send stream. > > I have the datasets snapped and local, I can reproduce this problem and > can dump any zdb data if needed. Here are some snippets of the zdb output.. > > Before the overwrite: > > 7c0000 L0 1:2584800:10000 10000L/10000P F=1 B=346/346 > > 7d0000 L0 1:2594800:10000 10000L/10000P F=1 B=346/346 > > 7e0000 L0 1:25a4800:10000 10000L/10000P F=1 B=346/346 > > 7f0000 L0 1:25b4800:10000 10000L/10000P F=1 B=346/346 > > 800000 L1 4000L B=548 > > 1000000 L1 0:7c23e00:400 1:7c2d200:400 4000L/400P F=128 > B=1268/1268 > > 1000000 L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268 > > 1010000 L0 0:7818c00:10000 10000L/10000P F=1 B=1268/1268 > > 1020000 L0 0:7828c00:10000 10000L/10000P F=1 B=1268/1268 > > The L1 hole is at offset 800000. After the partial overwrite (10 blocks > written at the beginning of the L1 range): > > 7d0000 L0 1:2594800:10000 10000L/10000P F=1 B=346/346 > > 7e0000 L0 1:25a4800:10000 10000L/10000P F=1 B=346/346 > > 7f0000 L0 1:25b4800:10000 10000L/10000P F=1 B=346/346 > > 800000 L1 0:ea36000:600 1:f2d6000:600 4000L/600P F=10 > B=1749/1749 > > 800000 L0 1:f246000:10000 10000L/10000P F=1 B=1749/1749 > > 810000 L0 1:f256000:10000 10000L/10000P F=1 B=1749/1749 > > 820000 L0 1:f266000:10000 10000L/10000P F=1 B=1749/1749 > > 830000 L0 1:f276000:10000 10000L/10000P F=1 B=1749/1749 > > 840000 L0 1:f286000:10000 10000L/10000P F=1 B=1749/1749 > > 850000 L0 1:f296000:10000 10000L/10000P F=1 B=1749/1749 > > 860000 L0 1:f2a6000:10000 10000L/10000P F=1 B=1749/1749 > > 870000 L0 1:f2c6000:10000 10000L/10000P F=1 B=1749/1749 > > 880000 L0 1:f2b6000:10000 10000L/10000P F=1 B=1749/1749 > > 890000 L0 0:ea26000:10000 10000L/10000P F=1 B=1749/1749 > > 1000000 L1 0:7c23e00:400 1:7c2d200:400 4000L/400P F=128 > B=1268/1268 > > 1000000 L0 0:7808c00:10000 10000L/10000P F=1 B=1268/1268 > > Dump of the new L1 block's contents: > > # zdb tpool -R 0:ea36000:600:di > > Found vdev: /dev/sdk1 > > DVA[0]=<1:f246000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=2041f382f58d:408f994ef048de7:daf0e1cadf74f53:47c3fdb952a1e13f > > DVA[0]=<1:f256000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=20306f1938d5:403381ccd468d8a:713a193137858160:d91b1c5cecb306af > > DVA[0]=<1:f266000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=1fe160444ab9:3fcbaeb1f31c86e:11198655b490d3b5:76cd3d278385af3e > > DVA[0]=<1:f276000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=201cb0f386db:4035cd7deb41749:6b4d734a11ce04b5:1fbc2dc2f169dcae > > DVA[0]=<1:f286000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=202d87c1a695:403d7ff6a1a6f53:66b049fa47216fb4:848b133855fab5b > > DVA[0]=<1:f296000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=200db48ae914:40163392a6e1f2a:62ad5c6b01c39d36:b1fa1b14d986fa82 > > DVA[0]=<1:f2a6000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=1feb72709d3a:3f8dbd7a3f1e98f:ab207f926cc8b2fc:c9a1145f06e1f9ab > > DVA[0]=<1:f2c6000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=1fd90de96ff0:3f98dc852f15900:a3cd2aed016bc0a9:eb9f507ffe495f15 > > DVA[0]=<1:f2b6000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=20361039de8d:40b6d13e4438295:75686dbb7da50937:3217ceae84d5b538 > > DVA[0]=<0:ea26000:10000> [L0 zvol object] fletcher4 uncompressed LE > contiguous unique single size=10000L/10000P birth=1749L/1749P fill=1 > cksum=202a397fecd2:405130f54a9a83d:cda024b471659627:edf740c0ca1563eb > > HOLE [L0 unallocated] size=200L birth=0L > > HOLE [L0 unallocated] size=200L birth=0L > > HOLE [L0 unallocated] size=200L birth=0L > > ... > > HOLE [L0 unallocated] size=200L birth=0L > > HOLE [L0 unallocated] size=200L birth=0L > > HOLE [L0 unallocated] size=200L birth=0L > > The uncompressed data is likely due to the /dev/urandom source. The volume > does have lz4 compression set (and had before the overwrite - inherited > from the pool). > > By induction, a similar issue is likely to arise with an Ln hole when it > is partially overwritten with non-hole block pointers. The remainder of the > new indirect block allocated in place of the Ln hole needs to be > backfilled with Ln-1 holes with the same birth epoch as the original Ln > hole. > > At this time, it is not clear to me how this is best accomplished.. Any > pointers are highly appreciated. > > > Best regards, > Boris. > > ------------------------------ > *From:* Matthew Ahrens <mahr...@delphix.com> > *Sent:* Monday, November 16, 2015 5:14 PM > *To:* Boris > *Cc:* zfs-de...@list.zfsonlinux.org; developer@open-zfs.org > *Subject:* Re: [OpenZFS Developer] zfs send not detecting new holes > > > > On Mon, Nov 16, 2015 at 4:36 AM, Boris <bprotopo...@hotmail.com> wrote: > >> I should have been more specific, in my case I see the problem with >> zvols: the first snapshot has a non-zero block, the next snapshot has the >> block overwrite with zeros, but the stream lacks the free record. The zvol >> is ~1.2T, 64k block size, sparse, has lz4 compression on. >> > > In that case I don't think your problem is related to the bug I mentioned, > which only has to do with objects that have been reallocated. You must be > seeing a different issue. We also can not reproduce your issue with a > simple test case. > > --matt > > >> >> Typos courtesy of my iPhone >> >> On Nov 15, 2015, at 12:25 PM, Matthew Ahrens <mahr...@delphix.com> wrote: >> >> btw, here is the bug you're asking about: >> https://www.illumos.org/issues/6370 >> >> --matt >> >> On Sun, Nov 15, 2015 at 9:24 AM, Matthew Ahrens <mahr...@delphix.com> >> wrote: >> >>> We have a fix for this that we need to upstream. We are waiting on code >>> reviews for another change to send/receive: >>> >>> https://github.com/openzfs/openzfs/pull/23 >>> 6393 zfs receive a full send as a clone >>> >>> I'll probably stop waiting soon and RTI it, then we get get our fix for >>> this in. >>> >>> --matt >>> >>> On Sun, Nov 15, 2015 at 8:37 AM, Boris <bprotopo...@hotmail.com> wrote: >>> >>>> Hi, guys, >>>> >>>> >>>> I've been looking an issue where sometimes, after non-zero data blocks >>>> are overwritten with zero blocks with compression on, the corresponding >>>> incremental send stream does not include the FREE record for those blocks. >>>> The zdb -ddddddd output seems to indicate that the blocks in question have >>>> never been written (the offsets for them are not listed in the output). >>>> >>>> >>>> This looks like the issue addressed by >>>> >>>> >>>> commit a4069eef2e403a3b2a307b23b7500e2adc6ecae5 >>>> >>>> Author: Prakash Surya <prakash.su...@delphix.com> >>>> >>>> Date: Fri Mar 27 13:03:22 2015 +1100 >>>> >>>> >>>> Illumos 5695 - dmu_sync'ed holes do not retain birth time >>>> >>>> >>>> but I certainly do have that commit. I have experimented with >>>> overwriting blocks at different offsets, ranges of blocks spanning L1 and >>>> L2 block pointers, but I cannot reproduce the issue. >>>> >>>> >>>> Any suggestions for directions to look ? Perhaps for a way to shape the >>>> block tree such that this problem could arise ? >>>> >>>> >>>> Best regards, >>>> >>>> Boris. >>>> >>>> _______________________________________________ >>>> developer mailing list >>>> developer@open-zfs.org >>>> http://lists.open-zfs.org/mailman/listinfo/developer >>>> >>>> >>> >> >
_______________________________________________ developer mailing list developer@open-zfs.org http://lists.open-zfs.org/mailman/listinfo/developer