Hey Samuel,

Yeah documentation in the code is the king.

I want to add that my report was actually not accurate,
The buggy code in the pager, or gnumach... it was in my code, and how i
calculate blocks etc.
That is all fixed now.
Progressive check-pointing is now working greatly, journal is most of the
time (almost) empty!

I'm in the phase of just polishing and bench-marking etc.

I hit one subtle architectural question where i would appreciate your input.

We now have new failure modes, for instance what happens if we are not able
to commit the transaction?
Or if we cannot add new buffer to the existing transaction.
Lets say we cannot allocate a buffer?
this is all happening under diskfs_node_update...
We have multiple options here:
1) Panic, journal is in inconsistent state, we cannot guarantee soundness
of the system.
Maybe extreme, but a possibility
2) Issue a warning and continue, things will be written to the file system
without being journaled eventually, not great.
3) This is a suble one, we could bubble up the error and let the VFS code
decide what it wants to do. For instance
It we failed during atime update, no need to panic, but if the rename has
gone bad, that is a different story.
For that we would need to change what diskfs_node_update to return error_t,
which would require touching many files.
Which could be done is a separate patch.

What are your thoughts here?

Thanks in advance
Milos


On Tue, Feb 17, 2026 at 10:35 PM Samuel Thibault <[email protected]>
wrote:

> Hello,
>
> Milos Nikic, le mar. 17 févr. 2026 11:00:34 -0800, a ecrit:
> > Ok let me maybe explain myself better and how I understand what is going
> on.
>
> Ok, but by point is that it's in the source code that this should be
> explained :)
>
> > This actually helps with the fact that things are first in the journal
> and only
> > then in the file system.
>
> "Helping" is not enough :)
>
> > Yes, journal_block_is_active function is the bulwark against filesystem
> writes
> > happening before the journal.
>
> And thus definitely needs documented in the source code itself, so
> readers get it easily.
>
> > I added logic into the ext2 pager to notify the journal when it is
> writing
> > blocks. Now the journal keeps track of which committed transactions it
> can
> > "retire" and progress the superblock tail.
>
> Cool :)
>
> > The Issue:  There are files and blocks (/dev/null, /tmp folder,
> > /tmp/.X11-unix, /var/log and some others) that seem to get hammered a
> > lot with metadata updates (mostly timestamps),
> [...]
> > It seems to me most of these are just access time updates. One idea
> would be to
> > simply ignore atime updates in the journal logic so we don't wait for
> them?
>
> /var/log is expected for data, but e.g. /dev/null is *really* not
> expected. Normally, the relatime option should already be taking care of
> updating atime only once a day per file when it's already younger than
> mtime/ctime. If it's not, we should really fix it, we have no reason to
> write that often.
>
> > yet the ext2 pager never seems to write them back.
>
> For translators, it is expected that no data is written. But still we
> shouldn't need to update the time, that's a bug that should be fixed.
>
> Milos Nikic, le mar. 17 févr. 2026 19:57:33 -0800, a ecrit:
> > Yes some "files" like /dev/null are a translators and can be excluded
> based on
> > mode alone. (whether that is good idea, is a separate question)
>
> We shouldn't have to exclude explicitly, relatime should be enough.
>
> > But there are other files that look perfectly ordinary:
> > For instance:
> >  /etc/resolv.conf
> > or
> > /tmp/.X11-unix
> > /tmp/.ICE-unix
> >
> > Occasionally their mode drops to 0, but overall these files are regular
> files
> > (not translators) that for some reason isn't handled by the ext2 pager.
> >
> > And its not all atime updates either...there are "other" updates as well.
> > This is all early boot though, but i still don't understand why isn't
> ext2
> > pager handling them.
>
> Possibly there's a bug to fix in there.
>
> > To my mind comes a few things, if we want to pursue them:
> > 1) Aggressive Filtering (The "Strict Lazy" approach)
> >    Logic: If !S_ISREG(mode) && !S_ISDIR(mode), ignore ALL timestamp-only
> > updates. Only journal if mode/uid/size changes.
> >     Pros: Likely solves the issue completely.
> >     Cons: Potentially risky if a file transitions states (e.g., git temp
> files)
> > or if we miss legitimate metadata updates on special nodes.
>
> Yeah, we don't want that.
>
> > 2) Active Checkpointing (The "Sweeper")
> >     Logic: If a transaction is stuck waiting for blocks, the journal
> thread
> > explicitly calls store_write for those blocks, bypassing the Pager's
> > dirty-check.
> >     Pros: Guarantees consistency.
> >     Cons: High complexity. It fights the Pager's logic and seems like a
> large
> > architectural change.
>
> We don't want to paper over what looks like a pager bug. We want to fix
> the pager.
>
> > 3) Perhaps just abandon block by block tail advancement idea for now, and
> > revert to "flush when almost full" approach which works well.
>
> If the scenario doesn't happen too often, the flush-when-almost-full can
> stay along the progressive eflush.
>
> Samuel
>

Reply via email to