Austin S. Hemmelgarn posted on Tue, 01 Aug 2017 10:47:30 -0400 as excerpted:
> I think I _might_ understand what's going on here. Is that test program > calling fallocate using the desired total size of the file, or just > trying to allocate the range beyond the end to extend the file? I've > seen issues with the first case on BTRFS before, and I'm starting to > think that it might actually be trying to allocate the exact amount of > space requested by fallocate, even if part of the range is already > allocated space. If I've interpreted correctly (not being a dev, only a btrfs user, sysadmin, and list regular) previous discussions I've seen on this list... That's exactly what it's doing, and it's _intended_ behavior. The reasoning is something like this: fallocate is supposed to pre- allocate some space with the intent being that writes into that space won't fail, because the space is already allocated. For an existing file with some data already in it, ext4 and xfs do that counting the existing space. But btrfs is copy-on-write, meaning it's going to have to write the new data to a different location than the existing data, and it may well not free up the existing allocation (if even a single 4k block of the existing allocation remains unwritten, it will remain to hold down the entire previous allocation, which isn't released until *none* of it is still in use -- of course in normal usage "in use" can be due to old snapshots or other reflinks to the same extent, as well, tho in these test cases it's not). So in ordered to provide the writes to preallocated space shouldn't ENOSPC guarantee, btrfs can't count currently actually used space as part of the fallocate. The different behavior is entirely due to btrfs being COW, and thus a choice having to be made, do we worst-case fallocate-reserve for writes over currently used data that will have to be COWed elsewhere, possibly without freeing the existing extents because there's still something referencing them, or do we risk ENOSPCing on write to a previously fallocated area? The choice was to worst-case-reserve and take the ENOSPC risk at fallocate time, so the write into that fallocated space could then proceed without the ENOSPC risk that COW would otherwise imply. Make sense, or is my understanding a horrible misunderstanding? =:^) So if you're actually only appending, fallocate the /additional/ space, not the /entire/ space, and you'll get what you need. But if you're potentially overwriting what's there already, better fallocate the entire space, which triggers the btrfs worst-case allocation behavior you see, in ordered to guarantee it won't ENOSPC during the actual write. Of course the only time the behavior actually differs is with COW, but then there's a BIG difference, but that BIG difference has a GOOD BIG reason! =:^) Tho that difference will certainly necessitate some relearning the /correct/ way to do it, for devs who were doing it the COW-worst-case way all along, even if they didn't actually need to, because it didn't happen to make a difference on what they happened to be testing on, which happened not to be COW... Reminds me of the way newer versions of gcc and/or trying to build with clang as well tends to trigger relearning, because newer versions are stricter in ordered to allow better optimization, and other implementations are simply different in what they're strict on, /because/ they're a different implementation. Well, btrfs is stricter... because it's a different implementation that /has/ to be stricter... due to COW. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html