So, as we found out a short time ago, this is actually fixed in ZoL as well, but after the 0.6.5.7
________________________________ From: Rich <rincebr...@gmail.com> Sent: Friday, July 8, 2016 12:37:52 PM To: develo...@lists.open-zfs.org Cc: developer; Developer Lists Illumos Subject: Re: [developer] Improvements to 6513 handling Hi Boris, I now have working code that implements this feature, and defaults to ignoring hole_birth data for sends if this feature is not enabled; I'm going to post patches for it after I've tested it on both ZoL and illumos. The latter portion of the above is easily changed, but "always correct and marginally less efficient for old data" seemed a better default than "usually correct". For a filesystem that prides itself on not allowing silent corruption, requiring manual detection and intervention for correctness seems unreasonable to me. - Rich On Fri, Jul 8, 2016 at 12:15 PM, Boris <bprotopo...@hotmail.com<mailto:bprotopo...@hotmail.com>> wrote: Hi, Rich, I agree that unconditional switch using the tunable is heavy if done 'as a matter of standard practice' as opposed to 'for a short time, to fixup the known corrupted backups'. To clarify the earlier suggestions, people with data affected by the bug can do two things: 1) install the code with 6513 fix and the patch with the tunable, then temporarily turn off the hole birth optimization, resend the 'difference' (a selected subset of incrementals) affected by the problem, then turn the optimization back on 2) install the code with 6513 fix without the patch, do a full send of the affected snapshots For 2) the non-incremental send would need to happen only once per the affected snapshot lineage. Once the missed holes are re-instated with the full send, the new fixed code will perform proper incremental sends. 1) is potentially more optimal in terms of resources (network bw, etc.) 2) is potentially simpler from the operational standpoint, does not require building/installing patched code, twiddling the tunables, etc. Boris. ________________________________ From: Rich <rincebr...@gmail.com<mailto:rincebr...@gmail.com>> Sent: Friday, July 8, 2016 11:26:27 AM To: developer Cc: Developer Lists Illumos Subject: Re: [developer] Improvements to 6513 handling Hi Boris, A full send of the affected snapshots should be safe, AIUI - but that means people would need to do non-incremental snapshot sends to be certain of not hitting this bug, which becomes increasingly infeasible as your datasets grow. If we're looking for the simplest solution without risk of data corruption, unconditionally ignoring the hole_birth data for doing a zfs send fits the bill, but seems a bit heavy-handed. This seemed like the best way to permit people to safely send older datasets while also permitting use of the hole_birth data going forward. - Rich On Fri, Jul 8, 2016 at 11:03 AM, Boris <bprotopo...@hotmail.com<mailto:bprotopo...@hotmail.com>> wrote: Hi, Rich, perhaps there is a simpler solution here. I think for the datasets affected by this feature, a full (not incremental) send of the source snapshot that has some holes that have not been transmitted by the faulty incremental send code, should fix the issue, as far as the on-disk layout is concerned. Boris. ________________________________ From: Rich <rincebr...@gmail.com<mailto:rincebr...@gmail.com>> Sent: Thursday, July 7, 2016 8:30:54 PM To: developer; Developer Lists Illumos Subject: [developer] Improvements to 6513 handling Hi all, So, ZFS on Linux just noticed it was getting bitten by what ultimately turned out to be Illumos #6513, partially filled holes losing birth time. Implementing that fix removes this problem for new data, but on all platforms, this doesn't help data already written on existing pools, getting munged silently in incremental sends forever. pcd pointed out that a relatively trivial workaround would be possible by simply ignoring the hole_birth metadata with something like a global tunable, but that seems too heavy-handed to me - either you're disabling the feature everywhere because you don't know when you can start trusting the birth times, or you're risking silent mangling of affected files forever. I'd like to suggest using a read-compatible feature, call it something like hole_birth_fix, in conjunction with the enabled_txg feature, to permit a reasonable default of ignoring hole_birth information before the hole_birth_fix feature was enabled, but still permitting use of it afterward. This has the unfortunate behavior of breaking write support if you enable hole_birth_fix and then try to go back to a prior codebase, but I can't think of a reasonable way to avoid this. I filed illumos #7175 to track this proposal - I'll happily write the code to implement this shortly. (Apologies if I've over-CCed or missed someone I should be asking for comment, I've not done this workflow before.) - Rich openzfs-developer | Archives<https://www.listbox.com/member/archive/274414/=now> [https://www.listbox.com/images/feed-icon-10x10.jpgf385ee7.jpg?uri=aHR0cHM6Ly93d3cubGlzdGJveC5jb20vaW1hZ2VzL2ZlZWQtaWNvbi0xMHgxMC5qcGc] <https://www.listbox.com/member/archive/rss/274414/28015082-95336ffa> | Modify<https://www.listbox.com/member/?&> Your Subscription [https://www.listbox.com/images/listbox-logo-small.pngf385ee7.png?uri=aHR0cHM6Ly93d3cubGlzdGJveC5jb20vaW1hZ2VzL2xpc3Rib3gtbG9nby1zbWFsbC5wbmc] <http://www.listbox.com> ------------------------------------------- openzfs-developer Archives: https://www.listbox.com/member/archive/274414/=now RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa Modify Your Subscription: https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c Powered by Listbox: http://www.listbox.com