Re: [developer] Improvements to 6513 handling

Boris Fri, 08 Jul 2016 10:31:58 -0700

So, as we found out a short time ago, this is actually fixed in ZoL as well, 
but after the 0.6.5.7

________________________________
From: Rich <rincebr...@gmail.com>
Sent: Friday, July 8, 2016 12:37:52 PM
To: develo...@lists.open-zfs.org
Cc: developer; Developer Lists Illumos
Subject: Re: [developer] Improvements to 6513 handling

Hi Boris,
I now have working code that implements this feature, and defaults to ignoring 
hole_birth data for sends if this feature is not enabled; I'm going to post 
patches for it after I've tested it on both ZoL and illumos.

The latter portion of the above is easily changed, but "always correct and 
marginally less efficient for old data" seemed a better default than "usually 
correct".

For a filesystem that prides itself on not allowing silent corruption, 
requiring manual detection and intervention for correctness seems unreasonable 
to me.

- Rich

On Fri, Jul 8, 2016 at 12:15 PM, Boris 
<bprotopo...@hotmail.com<mailto:bprotopo...@hotmail.com>> wrote:

Hi, Rich,

I agree that unconditional switch using the tunable is heavy if done 'as a 
matter of standard practice' as opposed to 'for a short time, to fixup the 
known corrupted backups'.

To clarify the earlier suggestions, people with data affected by the bug can do 
two things:

1) install the code with 6513 fix and the patch with the tunable, then 
temporarily turn off the hole birth optimization, resend the 'difference' (a 
selected subset of incrementals) affected by the problem, then turn the 
optimization back on

2) install the code with 6513 fix without the patch, do a full send of the 
affected snapshots

For 2) the non-incremental send would need to happen only once per the affected 
snapshot lineage. Once the missed holes are re-instated with the full send, the 
new fixed code will perform proper incremental sends.

1) is potentially more optimal in terms of resources (network bw, etc.)

2) is potentially simpler from the operational standpoint, does not require 
building/installing patched code, twiddling the tunables, etc.

Boris.

________________________________
From: Rich <rincebr...@gmail.com<mailto:rincebr...@gmail.com>>
Sent: Friday, July 8, 2016 11:26:27 AM
To: developer
Cc: Developer Lists Illumos
Subject: Re: [developer] Improvements to 6513 handling

Hi Boris,
A full send of the affected snapshots should be safe, AIUI - but that means 
people would need to do non-incremental snapshot sends to be certain of not 
hitting this bug, which becomes increasingly infeasible as your datasets grow.

If we're looking for the simplest solution without risk of data corruption, 
unconditionally ignoring the hole_birth data for doing a zfs send fits the 
bill, but seems a bit heavy-handed.

This seemed like the best way to permit people to safely send older datasets 
while also permitting use of the hole_birth data going forward.

- Rich

On Fri, Jul 8, 2016 at 11:03 AM, Boris 
<bprotopo...@hotmail.com<mailto:bprotopo...@hotmail.com>> wrote:

Hi, Rich,

perhaps there is a simpler solution here.

I think for the datasets affected by this feature, a full (not incremental) 
send of the source snapshot that has some holes that have not been transmitted 
by the faulty incremental send code, should fix the issue, as far as the 
on-disk layout is concerned.

Boris.

________________________________
From: Rich <rincebr...@gmail.com<mailto:rincebr...@gmail.com>>
Sent: Thursday, July 7, 2016 8:30:54 PM
To: developer; Developer Lists Illumos
Subject: [developer] Improvements to 6513 handling

Hi all,

So, ZFS on Linux just noticed it was getting bitten by what ultimately turned 
out to be Illumos #6513, partially filled holes losing birth time.

Implementing that fix removes this problem for new data, but on all platforms, 
this doesn't help data already written on existing pools, getting munged 
silently in incremental sends forever.

pcd pointed out that a relatively trivial workaround would be possible by 
simply ignoring the hole_birth metadata with something like a global tunable, 
but that seems too heavy-handed to me - either you're disabling the feature 
everywhere because you don't know when you can start trusting the birth times, 
or you're risking silent mangling of affected files forever.

I'd like to suggest using a read-compatible feature, call it something like 
hole_birth_fix, in conjunction with the enabled_txg feature, to permit a 
reasonable default of ignoring hole_birth information before the hole_birth_fix 
feature was enabled, but still permitting use of it afterward.

This has the unfortunate behavior of breaking write support if you enable 
hole_birth_fix and then try to go back to a prior codebase, but I can't think 
of a reasonable way to avoid this.

I filed illumos #7175 to track this proposal - I'll happily write the code to 
implement this shortly.

(Apologies if I've over-CCed or missed someone I should be asking for comment, 
I've not done this workflow before.)

- Rich

openzfs-developer | 
Archives<https://www.listbox.com/member/archive/274414/=now> 
[https://www.listbox.com/images/feed-icon-10x10.jpgf385ee7.jpg?uri=aHR0cHM6Ly93d3cubGlzdGJveC5jb20vaW1hZ2VzL2ZlZWQtaWNvbi0xMHgxMC5qcGc]
 <https://www.listbox.com/member/archive/rss/274414/28015082-95336ffa>  | 
Modify<https://www.listbox.com/member/?&;> Your Subscription     
[https://www.listbox.com/images/listbox-logo-small.pngf385ee7.png?uri=aHR0cHM6Ly93d3cubGlzdGJveC5jb20vaW1hZ2VzL2xpc3Rib3gtbG9nby1zbWFsbC5wbmc]
 <http://www.listbox.com>

-------------------------------------------
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com

Re: [developer] Improvements to 6513 handling

Reply via email to