Whoops, I hadn't intended to top-post... I'll do it correctly this time. On Thu, Sep 25, 2025 at 12:57 PM james young <[email protected]> wrote: > > I'm not sure what timeline to expect for a response. > Would a tarball of the outer image preserve everything needed for diagnostics? > > -James > > On Tue, Sep 23, 2025 at 2:10 PM james young <[email protected]> wrote: > > > > I hit an issue with btrfs compression; I reported it to Debian, which > > I was using, and they suggested that I take it upstream. > > > > Thanks, Salvatore. My apologies to everyone if I misunderstood. > > > > -James > > > > On Tue, Sep 23, 2025 at 1:50 PM Salvatore Bonaccorso <[email protected]> > > wrote: > > > > > > Control: tags -1 + moreinfo > > > > > > Hi James, > > > > > > On Tue, Sep 23, 2025 at 08:04:25PM +0200, James Young wrote: > > > > Package: src:linux > > > > Version: 6.1.129-1 > > > > Severity: normal > > > > X-Debbugs-Cc: [email protected] > > > > > > > > Dear Maintainer, > > > > > > > > > > > > * What led up to the situation? > > > > We made empty files in a loop, in parallel, under CPU and I/O load. > > > > We had an outer Btrfs image file with compression, which contained a > > > > Btrfs image file, which contained billions of empty files. > > > > We wrote around 100TB to the inner image file. > > > > Around 60TB in, compression quietly shut off. > > > > We ran out of space; both mounts presented i/o errors. > > > > > > > > * What exactly did you do (or not do) that was effective (or > > > > ineffective)? > > > > * I unmounted the inner and outer images. > > > > I didn't take note of memory usage before this point. > > > > * dump debug info for the outer image - `btrfs inspect-internal > > > > dump-tree --dfs ...` > > > > * We started a btrfsck. (twice, actually; breadth-first hit memory > > > > limits, I think) > > > > After that, I learned about `btrfs check`, but didn't interrupt the > > > > btrfsck, due to Sunk Cost Fallacy. > > > > The btrfsck is still running. It's of extremely dubious value now. > > > > * check the kernel logs > > > > * I grepped for btrfs, the mount points, compress, and zstd. I didn’t > > > > find a smoking gun in the right timeframe. > > > > > > > > not done yet: > > > > * mount the outer image > > > > * rebooted > > > > * tried a newer kernel. we're currently on kernel 6.1.129; we could go > > > > to newer 6.1 or 6.12 kernels > > > > * redo live file system compression, with e.g. `btrfs filesystem defrag > > > > -czstd` > > > > * fstrim the outer image > > > > > > > > goals: > > > > * work out what happened. > > > > How can we help? > > > > * help avoid it happening again, to others > > > > * salvage what we can > > > > > > > > I've run `bugreport` as a non-privileged user. Let me know if root > > > > access would give a fuller picture. > > > > > > I believe the best thing you could do here is to contact actually > > > upstream people directly. get_maintainers and the MAINTAINERS file > > > has: > > > > > > BTRFS FILE SYSTEM > > > M: Chris Mason <[email protected]> > > > M: Josef Bacik <[email protected]> > > > M: David Sterba <[email protected]> > > > L: [email protected] > > > S: Maintained > > > W: https://btrfs.readthedocs.io > > > Q: https://patchwork.kernel.org/project/linux-btrfs/list/ > > > C: irc://irc.libera.chat/btrfs > > > T: git git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git > > > F: Documentation/filesystems/btrfs.rst > > > F: fs/btrfs/ > > > F: include/linux/btrfs* > > > F: include/trace/events/btrfs.h > > > F: include/uapi/linux/btrfs* > > > > > > So I would suggest you to contact above maintainers including the > > > list. > > > > > > Please keep this downstream bugreport as well in the recipients list. > > > > > > Regards, > > > Salvatore
I made a tarball of the file system, then mounted and looked at the file systems. I attempted to recompress (with btrfs defrag) and fstrim, with little success in freeing up space. I started btrfs check with the progress option; within two hours, it had gotten to “[2/7] checking extents, 82 items checked”. I confused the extents with the compressed chunk length - 128KiB - so that seemed woefully low on progress. Over a week later, it’s still "82 items checked". It’s still taking CPU (3% right now) and gigs of memory; it’s doing something, though slowly. So, a question: * is this business as usual for a btrfs check? * is this a clue about what happened? * is this a symptom? If this is a useful metric for file system robustness, is this something I could / should experiment with to shorten? * run `sync` * periodically pause writes, to let the buffers empty Any thoughts or suggestions? -James

