On Thu, Mar 7, 2019 at 8:14 PM Zygo Blaxell <ce3g8...@umail.furryterror.org> wrote: > > On Mon, Mar 04, 2019 at 04:34:39PM +0100, Christoph Anton Mitterer wrote: > > Hey. > > > > > > Thanks for your elaborate explanations :-) > > > > > > On Fri, 2019-02-15 at 00:40 -0500, Zygo Blaxell wrote: > > > The problem occurs only on reads. Data that is written to disk will > > > be OK, and can be read correctly by a fixed kernel. > > > > > > A kernel without the fix will give corrupt data on reads with no > > > indication of corruption other than the changes to the data itself. > > > > > > Applications that copy data may read corrupted data and write it back > > > to the filesystem. This will make the corruption permanent in the > > > copied data. > > > > So that basically means even a cp (without refcopy) or a btrfs > > send/receive could already cause permanent silent data corruption. > > Of course, only if the conditions you've described below are met. > > > > > > > Given the age of the bug > > > > Since when was it in the kernel? > > Since at least 2015. Note that if you are looking for an end date for > "clean" data, you may be disappointed.
It's been around since compression was introduced (October 2008). The read ahead path was buggy for the case where the same compressed extent is shared consecutively. I fixed 2 bugs there back in 2015 but missed the case where there's a hole that makes the compressed extent be shared with a non-zero start offset, which is the case that was fixed recently. > > In 2016 there were two kernel bugs that silently corrupted reads of > compressed data. In 2015 there were...4? 5? Before 2015 the problems > are worse, also damaging on-disk compressed data and crashing the kernel. > The bugs that were present in 2014 were present since compression was > introduced in 2008. > > With this last fix, as far as I know, we have a kernel that can read > compressed data without corruption for the first time--at least for a > subset of use cases that doesn't include direct IO. Of course I thought > the same thing in 2017, too, but I have since proven myself wrong. > > When btrfs gets to the point where it doesn't fail backup verification for > some contiguous years, then I'll be satisfied btrfs (or any filesystem) > is properly debugged. I'll still run backup verification then, of > course--hardware breaks all the time, and broken hardware can corrupt > any data it touches. Verification failures point to broken hardware > much more often than btrfs data corruption bugs. > > > > Even > > > if > > > compression is enabled, the file data must be compressed for the bug > > > to > > > corrupt it. > > > > Is there a simple way to find files (i.e. pathnames) that were actually > > compressed? > > Run compsize (sometimes the package is named btrfs-compsize) and see if > there are any lines referring to zlib, zstd, or lzo in the output. > If it's all "total" and "none" then there's no compression in that file. > > filefrag -v reports non-inline compressed data extents with the "encoded" > flag, so > > if filefrag -v "$file" | grep -qw encoded; then > echo "$file" is compressed, do something here > fi > > might also be a solution (assuming your filename doesn't include the > string 'encoded'). > > > > - you never punch holes in files > > > > Is there any "standard application" (like cp, tar, etc.) that would do > > this? > > Legacy POSIX doesn't have the hole-punching concept, so legacy > tools won't do it; however, people add features to GNU tools all the > time, so it's hard to be 100% sure without downloading the code and > reading/auditing/scanning it. I'm 99% sure cp and tar are OK. > > > What do you mean by clone? refcopy? Would btrfs snapshots or btrfs > > send/receive be affected? > > clone is part of some file operation syscalls (e.g. clone_file_range, > dedupe_range) which make two different files, or two different offsets in > the same file, refer to the same physical extent. This is the basis of > deduplication (replacing separate copies with references to a single > copy) and also of punching holes (a single reference is split into > two references to the original extent with a hole object inserted in > the middle). > > "reflink copy" is a synonym for "cp --reflink", which is clone_file_range > using 0 as the start of range and EOF as the end. The term 'reflink' > is sometimes used to refer to any extent shared between files that is > not the result of a snapshot. reflink is to extents what a hardlink is > to inodes, if you ignore some details. > > To trigger the bug you need to clone the same compressed source range > to two nearly adjacent locations in the destination file (i.e. two or > more ranges in the source overlap). cp --reflink never overlaps ranges, > so it can't create the extent pattern that triggers this bug *by itself*. > > If the source file already has extent references arranged in a way > that triggers the bug, then the copy made with cp --reflink will copy > the arrangement to the new file (i.e. if you upgrade the kernel, you > can correctly read both copies, and if you don't upgrade the kernel, > both copies will appear to be corrupted, probably the same way). > > I would expect btrfs receive may be affected, but I did not find any > code in receive that would be affected. There are a number of different > ways to make a file with a hole in it, and btrfs receive could use a > different one not affected by this bug. I don't use send/receive myself, > so I don't have historical corruption data to guess from. > > > Or is there anything in btrfs itself which does any of the two per > > default or on a typical system (i.e. I didn't use dedupe). > > 'btrfs' (the command-line utility) doesn't do these operations as far > as I can tell. The kernel only does these when requested by applications. > > > Also, did the bug only affect data, or could metadata also be > > affected... basically should such filesystems be re-created since they > > may also hold corruptions in the meta-data like trees and so on? > > Metadata is not affected by this bug. The bug only corrupts btrfs data > (specificially, the contents of files) in memory, not disk. > > > My scenario looks about the following, and given your explanations, I'd > > assume I should probably be safe: > > > > - my normal laptop doesn't use compress, so it's safe anyway > > > > - my cp has an alias to always have --reflink=auto > > > > - two 8TB data archive disks, each with two backup disks to which the > > data of the two master disks is btrfs sent/received,... which were > > all mounted with compress > > > > > > - typically I either cp or mv data from the laptop to these disks, > > => should then be safe as the laptop fs didn't use compress,... > > > > - or I directly create the files on the data disks (which use compress) > > by means of wget, scp or similar from other sources > > => should be safe, too, as they probably don't do dedupe/hole > > punching by default > > > > - or I cp/mv from them camera SD cards, which use some *FAT > > => so again I'd expect that to be fine > > > > - on vacation I had the case that I put large amount of picture/videos > > from SD cards to some btrfs-with-compress mobile HDDs, and back home > > from these HDDs to my actual data HDDs. > > => here I do have the read / re-write pattern, so data could have > > been corrupted if it was compressed + deduped/hole-punched > > I'd guess that's anyway not the case (JPEGs/MPEGs don't compress > > well)... and AFAIU there would be no deduping/hole-punching > > involved here > > dedupe doesn't happen by itself on btrfs. You have to run dedupe > userspace software (e.g. duperemove, bees, dduper, rmlint, jdupes, bedup, > etc...) or build a kernel with dedupe patches. > > > - on my main data disks, I do snapshots... and these snapshots I > > send/receive to the other (also compress-mounted) btrfs disks. > > => could these operations involve deduping/hole-punching and thus the > > corruption? > > Snapshots won't interact with the bug--they are not affected by it > and will not trigger it. Send could transmit incorrect data (if it > uses the kernel's readpages path internally, I don't know if it does). > Receive seems not to be affected (though it will not detect incorrect > data from send). > > > Another thing: > > I always store SHA512 hashsums of files as an XATTR of them (like > > "directly after" creating such files). > > I assume there would be no deduping/hole-punching involved till then, > > so the sums should be from correct data, right? > > There's no assurance of that with this method. It's highly likely that > the hashes match the input data, because the file will usually be cached > in host RAM from when it was written, so the bug has no opportunity to > appear. It's not impossible for other system activity to evict those > cached pages between the copy and hash, so the hash function might reread > the data from disk again and thus be exposed to the bug. > > Contrast with a copy tool which integrates the SHA512 function, so > the SHA hash and the copy consume their data from the same RAM buffers. > This reduces the risk of undetected error but still does not eliminate it. > A DRAM access failure could corrupt either the data or SHA hash but not > both, so the hash will fail verification later, but you won't know if > the hash is incorrect or the data. > > If the source filesystem is not btrfs (and therefore cannot have this > btrfs bug), you can calculate the SHA512 from the source filesystem and > copy that to the xattr on the btrfs filesystem. That reduces the risk > pool for data errors to the host RAM and CPU, the source filesystem, > and the storage stack below the source filesystem (i.e. the generic > set of problems that can occur on any system at any time and corrupt > data during copy and hash operations). > > > But when I e.g. copy data from SD, to mobile btrfs-HDD and then to the > > final archive HDD... corruption could in principle occur when copying > > from mobile HDD to archive HDD. > > In that case, would a diff between the two show me the corruption? I > > guess not because the diff would likely get the same corruption on > > read? > > Upgrade your kernel before doing any verification activity; otherwise > you'll just get false results. > > If you try to replace the data before upgrading the kernel, you're more > likely to introduce new corruption where corruption did not exist before, > or convert transient corruption events into permanent data corruption. > You might even miss corrupted data because the bug tends to corrupt data > in a consistent way. > > Once you have a kernel with the fix applied, diff will show any corruption > in file copies, though 'cmp -l' might be much faster than diff on large > binary files. Use just 'cmp' if you only want to know if any difference > exists but don't need detailed information, or 'cmp -s' in a shell script. > > >[...] > > I assume normal mv of refcopy (i.e. cp --reflink=auto) would not punch > > holes and thus be not affected? > > > > Further, I'd assume XATTRs couldn't be affected? > > XATTRs aren't compressed file data, so they aren't affected by this bug > which only affects compressed file data. > > > So what remains unanswered is send/receive: > > > > > btrfs send and receive may be affected, but I don't use them so I > > > don't > > > have any experience of the bug related to these tools. It seems from > > > reading the btrfs receive code that it lacks any code capable of > > > punching > > > a hole, but I'm only doing a quick search for words like "punch", not > > > a detailed code analysis. > > > > Is there some other developer who possibly knows whether send/receive > > would have been vulnerable to the issue? > > > > > > But since I use send/receive anyway in just one direction from the > > master to the backup disks... only the later could be affected. > > I presume from this line of questioning that you are not in the habit > of verifying the SHA512 hashes on your data every few weeks or months. > If you had that step in your scheduled backup routine, then you would > already be aware of data corruption bugs that affect you--or you'd > already be reasonably confident that this bug has no impact on your setup. > > If you had asked questions like "is this bug the reason why I've been > seeing random SHA hash verification failures for several years?" then > you should worry about this bug; otherwise, it probably didn't affect you. > > > Thanks, > > Chris. > > > > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”