Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

Steven Haigh Sun, 26 Jun 2016 20:21:33 -0700

On 2016-06-27 08:33, ronnie sahlberg wrote:

On Sat, Jun 25, 2016 at 7:53 PM, Duncan <1i5t5.dun...@cox.net> wrote:

Chris Murphy posted on Sat, 25 Jun 2016 11:25:05 -0600 as excerpted:
Wow. So it sees the data strip corruption, uses good parity on diskto
fix it, writes the fix to disk, recomputes parity for some reason but
does it wrongly, and then overwrites good parity with bad parity?
That's fucked. So in other words, if there are any errors fixed up
during a scrub, you should do a 2nd scrub. The first scrub shouldmake
sure data is correct, and the 2nd scrub should make sure the bug is
papered over by computing correct parity and replacing the badparity.
I wonder if the same problem happens with balance or if this is justa
bug in scrub code?
Could this explain why people have been reporting so many raid56 mode
cases of btrfs replacing a first drive appearing to succeed just fine,
but then they go to btrfs replace a second drive, and the arraycrashesas if the first replace didn't work correctly after all, resulting intwo
bad devices once the second replace gets under way, of course bringing
down the array?

If so, then it looks like we have our answer as to what has been going
wrong that has been so hard to properly trace and thus to bugfix.

Combine that with the raid4 dedicated parity device behavior you're
seeing if the writes are all exactly 128 MB, with that possibly
explaining the super-slow replaces, and this thread may have justgiven
us answers to both of those until-now-untraceable issues.

Regardless, what's /very/ clear by now is that raid56 mode as it
currently exists is more or less fatally flawed, and a full scrap and
rewrite to an entirely different raid56 mode on-disk format may be
necessary to fix it.
And what's even clearer is that people /really/ shouldn't be usingraid56
mode for anything but testing with throw-away data, at this point.
Anything else is simply irresponsible.
Does that mean we need to put a "raid56 mode may eat your babies"level
warning in the manpage and require a --force to either mkfs.btrfs or
balance to raid56 mode?  Because that's about where I am on it.


Agree. At this point letting ordinary users create raid56 filesystems
is counterproductive.

+1

I would suggest:

1, a much more strongly worded warning in the wiki. Make sure there
are no misunderstandings
that they really should not use raid56 right now for new filesystems.

I voiced my concern on #btrfs about this - it really should show thatthis may eat your data and is properly experimental. At the moment, itlooks as if the features are implemented and working as expected. In mycase with nothing out of the ordinary - I've now got ~3.8Tb free diskspace. Certainly not ready for *ANY* kind of public use.

2, Instead of a --force flag. (Users tend to ignore ---force and
warnings in documentation.)
Instead ifdef out the options to create raid56 in mkfs.btrfs.
Developers who want to test can just remove the ifdef and recompile
the tools anyway.
But if end-users have to recompile userspace, that really forces the
point that "you
really should not use this right now".

I think this is a somewhat good idea - however it should be a warningalong the lines of:"BTRFS RAID56 is VERY experimental and is known to corrupt data incertain cases. Use at your own risk!


Continue? (y/N):"

3, reach out to the documentation and fora for the major distros and
make sure they update their
documentation accordingly.
I think a lot of end-users, if they try to research something, are
more likely to go to <their-distro> fora and wiki
than search out an upstream fora.


Another good idea.

I'd also recommend updates to the ArchLinux wiki - as for some reason Ialways seem to end up there when searching for a certain topic...


--
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] Btrfs scrub sometime recalculate wrong parity in raid5

Reply via email to