On Tue, May 19, 2015 at 09:40:05AM -0400, Theodore Ts'o wrote: > On Mon, May 18, 2015 at 03:58:24PM -0700, [email protected] wrote: > > > > I recently had my server's filesystem implode, and I'm currently in the > > process of cleaning it up. It had widespread corruption in files and > > directories scattered across the filesystem, though all vaguely recently > > changed. Directories appeared corrupted or truncated, various files > > showed up as piles of NULs, and 5000+ files and directories ended up in > > lost+found. I observed this corruption shortly after a reboot into > > 4.0.2 (from a previous kernel of 3.16), with ext4 noticing an > > inconsistency and mounting the filesystem read-only. The underling > > disks had no errors. > > > > Reading about the corruption issue fixed by > > d2dc317d564a46dfc683978a2e5a4f91434e9711 ("ext4: fix data corruption > > caused by unwritten and delayed extents"), it sounds plausible. Can > > that strike both file data and directory data, assuming all of that data > > ended up grouped with a delayed extent? Would that bug manifest as > > corrupted directories and files filled with NULs? The system is a > > 72-way server on which I was doing piles of parallel git pulls and > > builds, so hitting a race seems plausible. > > Unfortunately, I don't think you can blame all of your problems on the > bug fixed by this particular bug. First of all, it doesn't apply to > directories at all; secondly, it's been around for a long time. I'd > have to check and see whether or not 3.16 had the problem, but it > wouldn't surprise me at all. Finally, git pulls and builds are not > at all likely to hit the problem. > > It requires the combination of (a) writing to a portion of a file that > was not previously allocated using buffered I/O, (b) an fallocate of a > region of the file which is a superset of region written in (a) before > it has chance to be written to disk, (c) waiting for the file data in > (a) to be written out to disk (either via fsync or via the writeback > daemons), and then (d) before the extent status cache gets pushed out > of memory, another random write to a portion of the file covered by > (a) -- in which case that specific portion of (a) could be replaced by > all zeros. > > Even most database or torrent downloads are not likely to hit this > pattern, since it requires an fallocate of a previous previously (and > very recently) allocated region of a file using a buffered write. > Torrent downloads will tend to fallocate the whole file in advance, > and while Oracle or DB2 might intermix writes and fallocates, they > don't fallocate previously written regions of the file, and they use > direct I/O in any case.
Ah, thanks for the clarification. :( In particular, I didn't realize this was *only* the data of the delayed-extent-based files. The bug here seems to have struck various recently-written files and directories. (Recent in days, not seconds, as far as I can tell; and it isn't universal based on age.) The initial symptom was ext4 noticing that a directory was corrupt (truncated, IIRC) and immediately marking the whole filesystem read-only. > So it's pretty hard to hit this bug by accident, unless you happen to > be using fsx, and even then, the only files that would get corrupted > would be the files being written using fsx. So I'm afraid you'll have > to look farther afield, and consider other bugs as well as potential > hardware problems before trusting the system again. I'm quite skeptical of hardware problems. The system is a few months old, well past infant-mortality and too young for burnout. And I've tested the disks carefully. Are there any other known bugs that seem likely to fit the symptoms and circumstances? Note that since I saw this after rebooting from 3.16 into 4.0.2, I don't know whether the corruption was more likely caused by 3.16 or 4.0.2. > P.S. It's bugs like these which is why I'm always amused by people > who think that just because a file system is safely being used by > their developers, that it's safe to throw production workloads on > them. Heh. Yeah, I like exciting new software in most areas, but not in filesystems. In filesystems I prefer boring. :) > These sorts of subtle data corruptors tend to be highly timing > depend, and very hard to find. Sometimes these bugs can hang around > for years before they are found and fixed. The flip side is that > fortunately, they tend to strike very rarely. ...lucky me. > It's also why I'm very > grateful for developers like Jan and Lukas. :-) Indeed. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

