Re: [PATCH] Prototype metadata journaling system for libdiskfs

Samuel Thibault Mon, 21 Jul 2025 16:04:55 -0700

Milos Nikic, le lun. 21 juil. 2025 11:38:00 -0700, a ecrit:
> > Which kind of operations is spamming? As I mentioned, we most probably
> > want to implement relatime, that'll be useful to avoid many writes
> > anyway.
> 
> Mainly `utime` updates to `/dev/null` and `/dev/random`.


Which would be caught by relatime.

«
Access time is only updated if the previous access time was earlier than
or equal to the current modify or change time.
»

Better take the time to implement that, since that'll save the
corresponding inode writes too.

> > Better use the ext3/4 native way of allocating blocks for the journal.
> 
> That’s exactly what I’d like to do next — but I’m not sure how to get there in
> this context. Would this involve allocating blocks outside the main filesystem
> namespace via libstore? Any pointers or examples would be really appreciated.

No, it's still in the disk storage. It's just that ext3 has a way to
reserve blocks for the journal. I don't know a reference for this but it
should be easy to find.

> > Does the normal path lookup not work? At worse by rearranging some code
> > to provide an internal version not meant for RPCs.
> 
> That’s the trick: the issue isn’t how, but *when*. 
> The journal contains information from before the crash, but after reboot, 
> we’re
> walking a post-crash live filesystem. If we try to resolve inode paths at 
> boot,
> we might end up with mismatches, or restoring paths that no longer make sense.

But the journal is supposed to be in an order that makes sense
sequentially. Again, better check how ext3/4/jbd are doing it, rather
than trying to re-invent them.

> One additional note: while testing i have discovered  that the filesystem
> remains read-only at that early point and it onl stops being readonly  after
> the RPC come online. 
> If is just call diskfs_node_update that early (as i do in the patch) it
> silently has no effect (!!!) 

You probably just want to set diskfs_readonly = 0 while playing the
journal, and reset it to what it was (as ask on the command-line etc.)
just before unleashing RPCs.

>  On the other hand, once RPCs are up, trying to walk the FS to replay changes
> risks deadlocks.

Sure, you don't want that.

> It feels like journaling recovery needs to happen in a carefully coordinated
> phase — perhaps a new pre-init mode, or deeper integration with `diskfs`
> itself.

Yes. Feel free to add hooks if libdiskfs doesn't have what you need.

Samuel

Re: [PATCH] Prototype metadata journaling system for libdiskfs

Reply via email to