Hi Chris,

it seems that systemd-journald is more smart/complex than I thought:

1) systemd-journald set the "live" journal as NOCOW; *when* (see below) it
closes the files, it mark again these as COW then defrag [1]

2) looking at the code, I suspect that systemd-journald closes the
file asynchronously [2]. This means that looking at the "live" journal
is not sufficient. In fact:

/var/log/journal/e84907d099904117b355a99c98378dca$ sudo lsattr $(ls -rt *)
[...]
--------------------- 
user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000bd4f-0005baed61106a18.journal
--------------------- 
system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000bd64-0005baed659feff4.journal
--------------------- 
user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000bd67-0005baed65a0901f.journal
---------------C----- 
system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000cc63-0005bafed4f12f0a.journal
---------------C----- 
user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000cc85-0005baff0ce27e49.journal
---------------C----- 
system@3f2405cf9bcf42f0abe6de5bc702e394-000000000000cd38-0005baffe9080b4d.journal
---------------C----- 
user-1000@97aaac476dfc404f9f2a7f6744bbf2ac-000000000000cd3b-0005baffe908f244.journal
---------------C----- user-1000.journal
---------------C----- system.journal

The output above means that the last 6 files are "pending" for a 
de-fragmentation. When these will be
"closed", the NOCOW flag will be removed and a defragmentation will start.

Now my journals have few (2 or 3 extents). But I saw cases where the extents
of the more recent files are hundreds, but after few "journalct --rotate" the 
older files become less
fragmented.

[1] 
https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/libsystemd/sd-journal/journal-file.c#L383
[2] 
https://github.com/systemd/systemd/blob/fee6441601c979165ebcbb35472036439f8dad5f/src/libsystemd/sd-journal/journal-file.c#L3687

On 2/10/21 7:37 AM, Chris Murphy wrote:
This is an active (but idle) system.journal file. That is, it's open
but not being written to. I did a sync right before this:

https://pastebin.com/jHh5tfpe

And then: btrfs fi defrag -l 8M system.journal

https://pastebin.com/Kq1GjJuh

Looks like most of it was a no op. So it seems btrfs in this case is
not confused by so many small extent items, it know they are
contiguous?

It doesn't answer the question what the "too small" threshold is for
BTRFS_IOC_DEFRAG, which is what sd-journald is using, though.

Another sync, and then, 'journalctl --rotate' and the resulting
archived file is now:

https://pastebin.com/aqac0dRj

These are not the same results between the two ioctls for the same
file, and not the same result as what you get with -l 32M (which I do
get if I use the default 32M). The BTRFS_IOC_DEFRAG interleaved result
is peculiar, but I don't think we can say it's ineffective, it might
be an intentional no op either because it's nodatacow or it sees that
these many extents are mostly contiguous and not worth defragmenting
(which would be good for keeping write amplification down).

So I don't know, maybe it's not wrong.

--
Chris Murphy



--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5

Reply via email to