Hi Chris, it seems that systemd-journald is more smart/complex than I thought:
1) systemd-journald set the "live" journal as NOCOW; *when* (see below) it closes the files, it mark again these as COW then defrag [1] 2) looking at the code, I suspect that systemd-journald closes the file asynchronously [2]. This means that looking at the "live" journal is not sufficient. In fact: /var/log/journal/e84907d099904117b355a99c98378dca$ sudo lsattr $(ls -rt *) [...] --------------------- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000bd4f-0005baed61106a18.journal --------------------- system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000bd64-0005baed659feff4.journal --------------------- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000bd67-0005baed65a0901f.journal ---------------C----- system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000cc63-0005bafed4f12f0a.journal ---------------C----- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000cc85-0005baff0ce27e49.journal ---------------C----- system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000cd38-0005baffe9080b4d.journal ---------------C----- user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000cd3b-0005baffe908f244.journal ---------------C----- user-1000.journal ---------------C----- system.journal The output above means that the last 6 files are "pending" for a de-fragmentation. When these will be "closed", the NOCOW flag will be removed and a defragmentation will start. Now my journals have few (2 or 3 extents). But I saw cases where the extents of the more recent files are hundreds, but after few "journalct --rotate" the older files become less fragmented. [1] https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/libsystemd/sd-journal/journal-file.c#L383 [2] https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/libsystemd/sd-journal/journal-file.c#L3687 On 2/10/21 7:37 AM, Chris Murphy wrote:
This is an active (but idle) system.journal file. That is, it's open but not being written to. I did a sync right before this: https://pastebin.com/jHh5tfpe And then: btrfs fi defrag -l 8M system.journal https://pastebin.com/Kq1GjJuh Looks like most of it was a no op. So it seems btrfs in this case is not confused by so many small extent items, it know they are contiguous? It doesn't answer the question what the "too small" threshold is for BTRFS_IOC_DEFRAG, which is what sd-journald is using, though. Another sync, and then, 'journalctl --rotate' and the resulting archived file is now: https://pastebin.com/aqac0dRj These are not the same results between the two ioctls for the same file, and not the same result as what you get with -l 32M (which I do get if I use the default 32M). The BTRFS_IOC_DEFRAG interleaved result is peculiar, but I don't think we can say it's ineffective, it might be an intentional no op either because it's nodatacow or it sees that these many extents are mostly contiguous and not worth defragmenting (which would be good for keeping write amplification down). So I don't know, maybe it's not wrong. -- Chris Murphy
-- gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it> Key fingerprint BBF5 1610 0B64 DAC6 5F7D 17B2 0EDA 9B37 8B82 E0B5