Looks like Qu may have taken care of corrupted compressed data with
NODATASUM from causing causing random kernel memory corruption.

As long as the compressed data was valid and could be uncompressed,
there were no problems, even on data marked NOCOW/NODATASUM.  If the
data being sent to be uncompressed was invalid and failed
decompression, it would sometimes give an I/O error, and sometimes
cause random kernel memory corruption.

I retraced my steps to try to figure out how my data got corrupted in
the first place.  The pattern of corruption didn't make any sense for
this to be hardware related, or a user-caused badly executed dd.

In short, "btrfs device replace" caused it.  When it copies data to a
new drive, and encounters NOCOW/NODATASUM compressed data, it copies
the data in uncompressed form to the new drive, leaving it in
compressed form on the other mirror, leaving it all marked as
compressed.  I don't know how it handles the longer length.  I don't
know if it only writes out the compressed length, if it writes out the
uncompressed length possibly overwriting other data its writing out or
even worse other file extents, etc.

To rule out anything else, I started in a fresh VM with the May 1,
2018, Arch installation ISO.  That's kernel 4.16.5, btrfs-progs 4.16.

Starting with a freshly partitioned disk with (3) 10GB partitions, a
fairly minimal reproducing case with lots of explaining comments can
be read here:

https://pastebin.com/VvNk90Wa

I of course don't know the extent of this.  I don't know all of the
situations where NOCOW/NODATASUM extents are compressed anyway.  In my
real world case, it was journald logs.  We know journald/systemd
submits those for defragmentation.  I haven't verified if it submits
the defragmentation asking for compression.  In my reproducing example
linked above, I had to defragment the file asking for compression to
cause the file to be compressed.  If that's the extent of the bug,
probably lots of journald logs out there that have been through a
replace have corruption, but hopefully no databases.  I don't think
any databases, and not many database administrators, are going to
submit the files for defragmentation with compression.  But, if
compression can be triggered in more situations than this, it's
possible there's a lot of corruption (sometimes silent) out there on
important things like databases.

Obviously, btrfs device replace or something it depends on needs
fixing.  It's above my pay grade on if some type of alert should be
sent out saying not to use replace on btrfs-progs less than a new
version that hasn't come out yet.  Probably depends on how big the
extent of the bug is.

I also submit that even with corrupted compressed data no longer being
submitted for decompression, and even with btrfs device replace soon
being patched, that there should be a way for all NODATASUM data that
is mirrored to have the mirrored copies compared, regardless of if
compression is involved.  I think check or scrub should gain this
functionality.  Obviously, without a checksum, no automatic repair can
happen, but the user can at least be alerted that something is wrong.
As the example will show, if the corruption happens on the mirrored
copy that isn't read, it's silent corruption, unless that good copy
goes bad someday.  Btrfs has a chance to give NODATASUM data extra
protection over other filesystems, somewhere between mirrored copies
just really protecting against a disk failure like most
implementations and like btrfs does with NODATASUM data now, and
between btrfs' checksummed mirrors that guard against bit rot and
one-mirror accidental corruption.

I'd even be interested in writing such an addon to check or scrub, if
it would be accepted, assuming it was written well and worked of
course.  If someone else wants to do it, that's OK too.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to