Hello,

Milos Nikic, le jeu. 05 mars 2026 09:31:26 -0800, a ecrit:
> Hurd VFS works in 3 layers:
> 
>  1. Node cache layer: The abstract node lives here and it is the ground truth
>     of a running file system. When one does a stat myfile.txt, we get the
>     information straight from the cache. When we create a new file, it gets
>     placed in the cache, etc.
> 
>  2. Pager layer: This is where nodes are serialized into the actual physical
>     representation (4KB blocks) that will later be written to disk.
> 
>  3. Hard drive: The physical storage that receives the bytes from the pager.
> 
> During normal operations (not a sync mount, fsync, etc.), the VFS operates
> almost entirely on Layer 1: The Node cache layer. This is why it's super fast.
> User changed atime? No problem. It just fetches a node from the node cache
> (hash table lookup, amortized to O(1)) and updates the struct in memory. And
> that is it.

Yes, so that we get as efficient as possible.

> Only when the sync interval hits (every 30 seconds by default) does the Node
> cache get iterated and serialized to the pager layer (diskfs_sync_everything 
> ->
>  write_all_disknodes -> write_node -> pager_sync). So basically, at that
> moment, we create a snapshot of the state of the node cache and place it onto
> the pager(s).

It's not exactly a snapshot because the coherency between inodes and
data is not completely enforced (we write all disknodes before asking
the kernel to write back dirty pages, and then poke the writes).

> Even then, pager_sync is called with wait = 0. It is handed to the pager, 
> which
> sends it to Mach. At some later time (seconds or so later), Mach sends it back
> to the ext2 pager, which finally issues store_write to write it to Layer 3 
> (The
> Hard drive). And even that depends on how the driver reorders or delays it.
> 
> The effect of this architecture is that when store_write is finally called, 
> the
> absolute latest version of the node cache snapshot is what gets written to the
> storage. Is this basically correct?

It seems to be so indeed.

> Are there any edge cases or mechanics that are wrong in this model
> that would make us receive a "stale" node cache snapshot?

Well, it can be "stale" if another RPC hasn't called
diskfs_node_update() yet, but that's what "safe" FS are all about: not
actually provide more than coherency of the content on the disk so fsck
is not suppposed to be needed. Then, if a program really wants coherency
between some files etc. it has to issue sync calls, dpkg does it for
instance.

Samuel

Reply via email to