Re: [PATCH 1/3] ext2fs: Add JBD2 on-disk layout headers

Milos Nikic Thu, 19 Mar 2026 09:24:49 -0700

Hello,

Thanks for trying to build this.


Yeah I think the
hurd-0.9.git20251029 is a bit dated and
doesn't contain all the necessary code in libstore and other libs that is
needed ( the store_sync function etc) so it won't compile/work.

For now it needs to be built from Hurd git repository Master branch. If
that is an option.

I could try and create a separate giant patch that encompasses all the
changes since
hurd-0.9.git20251029 , but Im not sure that is the right way.


Regards,
Milos











On Thu, Mar 19, 2026, 7:07 AM gfleury <[email protected]> wrote:

> Hi,
>
> I'm attempting to build Hurd with the JBD2 journaling patch on AMD64, but
> I'm encountering a compilation error that I need help resolving.
>
> Here's what I'm doing:
>
> 1. apt update
> 2. apt source hurd
> 3. cd hurd-0.9.git20251029
> 4. patch -p1 <
> ../v5-0001-ext2fs-Add-JBD2-journaling-to-ext2-libdiskfs.patch
> 5. sudo dpkg-buildpackage -us -uc -b
>
> The build fails with this error:
>
> ```
> ../../ext2fs/journal.c: In function 'flush_to_disk':
> ../../ext2fs/journal.c:592:17: error: implicit declaration of function
> 'store_sync' [-Wimplicit-function-declaration]
> 592 |   error_t err = store_sync (store);
>     |                 ^~~~~~~~~~
> ```
>
> Interestingly, the store_sync function doesn't seem to exist in
> hurd-0.9.git20251029, but it is present in the current master branch. Could
> this be a version compatibility issue with the patch?
>
> Any guidance would be appreciated. Thanks!
>
> Le 2026-03-18 01:04, Milos Nikic a écrit :
>
> Hello again Samuel,
>
> First of all, I want to apologize again for the patch churn over the
> past week. I wanted to put this to rest properly, and I am now sending
> my final, stable version.
>
> This is it. I have applied numerous fixes, performance tweaks, and
> cleanups. I am happy to report that this now performs on par with
> unjournaled ext2 on normal workloads, such as configuring/compiling
> the Hurd, installing and reinstalling packages via APT, and untarring
> large archives (like the Linux kernel). I have also heavily tested it
> against artificial stress conditions (which I am happy to share if
> there is interest), and it handles highly concurrent loads beautifully
> without deadlocks or memory leaks.
>
> Progressive checkpointing ensures the filesystem runs smoothly, and
> the feature remains strictly opt-in (until a partition is tuned with
> tune2fs -j, the journal is completely inactive).
>
> The new API in libdiskfs is minimal but expressive enough to wrap all
> filesystem operations in transactions and handle strict POSIX sync
> barriers.
>
> Since v4, I have made several major architectural improvements:
>
> Smart Auto-Commit: diskfs_journal_stop_transaction now automatically
> commits to disk if needs_sync has been flagged anywhere in the nested
> RPC chain and the reference count drops to zero.
>
> Cleaned ext2 internal Journal API: I have exposed journal_store_write
> and journal_store_read as block-device filter layers. Internal state
> checks (journal_has_active_transaction, etc.) are now strictly hidden.
> How the journal preserves the WAL property is now very obvious, as it
> directly intercepts physical store operations.
>
> The "Lifeboat" Cache: Those store wrappers now utilize a small,
> temporary internal cache to handle situations where the Mach VM pager
> rushes blocks due to memory pressure. The Lifeboat seamlessly
> intercepts and absorbs these hazard blocks without blocking the pager
> or emitting warnings, even at peak write throughput.
>
> As before, I have added detailed comments across the patch to explain
> the state machine and locking hierarchy. I know this is a complex
> subsystem, so I am more than happy to write additional documentation
> in whatever form is needed.
>
> Once again, apologies for the rapid iterations. I won't be touching
> this code further until I hear your feedback.
>
> Kind regards,
> Milos
>
> On Sun, Mar 15, 2026 at 9:01 PM Milos Nikic <[email protected]>
> wrote:
>
>
> Hi Samuel,
>
> I am writing to sincerely apologize for the insane amount of patch churn
> over the last week. I know the rapid version bumps from v2 up to v4 have
> been incredibly noisy, and I want to hit the brakes before you spend any
> more time reviewing the current code.
>
> While running some extreme stress tests on a very small ext2 partition
> with the tiniest (allowed by the tooling) journal, I caught a few critical
> edge cases. While fixing those, I also realized that my libdiskfs VFS API
> boundary is clunkier than it needs to be. I am currently rewriting it to
> more closely match Linux's JBD2 semantics, where the VFS simply flags a
> transaction for sync and calls stop, allowing the journal to auto-commit
> when the reference count drops to zero.
>
> I'm also adding handling for cases where the Mach VM pager rushes blocks
> to the disk while they are in the process of committing. This safely
> intercepts them and will remove those warnings and WAL violations in almost
> all cases.
>
> Please completely disregard v4.
>
> I promise the churn is coming to an end. I am going to take a little time
> to finish this API contraction, stress-test it, polish it, and make sure it
> is 100% rock-solid. I will be back soon with a finalized v5.
>
> Thanks for your patience with my crazy iteration process!
>
> Best, Milos
>
>
>
> On Thu, Mar 12, 2026 at 8:53 AM Milos Nikic <[email protected]>
> wrote:
>
>
> Hi Samuel,
>
> As promised, here is the thoroughly tested and benchmarked V4 revision of
> the JBD2 Journal for Hurd.
>
> This revision addresses a major performance bottleneck present in V3 under
> heavy concurrent workloads. The new design restores performance to match
> vanilla Hurd's unjournaled ext2fs while preserving full crash consistency.
>
> Changes since V3:
> - Removed eager memcpy() from the journal_dirty_block() hot-path.
> - Introduced deferred block copying that triggers only when the
> transaction becomes quiescent.
> - Added a `needs_copy` flag to prevent redundant memory copies.
> - Eliminated the severe lock contention and memory bandwidth pressure
> observed in V3.
>
> Why the changes in v4 vs v3?
> I have previously identified that the last remaining performance
> bottleneck is memcpy of 4k byte every time journal_dirty_block is called.
> And i was thinking about how to improve it.
> Deferred copy comes to mind, But...
> The Hurd VFS locks at the node level rather than the physical block level
> (as Linux does). Because multiple nodes may share the same 4KB disk block,
> naively deferring the journal copy until commit time can capture torn
> writes if another thread is actively modifying a neighboring node in the
> same block.
>
> Precisely because of this V3 performed a 4KB memcpy immediately inside
> journal_dirty_block() (copy on write) while the node lock was held. While
> safe, this placed expensive memory operations and global journal lock
> contention directly in the VFS hot-path, causing severe slowdowns under
> heavy parallel workloads.
>
> V4 removes this eager copy entirely by leveraging an existing transaction
> invariant:
> All VFS threads increment and decrement the active transaction’s
> `t_updates` counter via the start/stop transaction functions. A transaction
> cannot commit until this counter reaches zero.
> When `t_updates == 0`, we are mathematically guaranteed that no VFS
> threads are mutating blocks belonging to the transaction. At that exact
> moment, the memory backing those blocks has fully settled and can be safely
> copied without risk of torn writes. A perfect place for a deferred write!
>
> journal_dirty_block() now simply records the dirty block id in a hash
> table, making the hot-path strictly O(1). (and this is why we have an
> amazing performance boost between v3 and v4)
>
> But we also need to avoiding redundant copies:
> Because transactions remain open for several seconds, `t_updates` may
> bounce to zero and back up many times during a heavy workload (as multiple
> VFS threads start/stop the transaction). To avoid repeatedly copying the
> same unchanged blocks every time the counter hits zero, each shadow buffer
> now contains a `needs_copy` flag.
>
> When a block is dirtied, the flag is set. When `t_updates` reaches zero,
> only buffers with `needs_copy == 1` are copied to the shadow buffers, after
> which the flag is cleared.
> So two things need to be true in order for a block to be copied: 1)
> t_updates must just hit 0 and 2) needs_copy needs to be 1
>
> This architecture completely removes the hot-path bottleneck. Journaled
> ext2fs now achieves performance virtually identical to vanilla ext2fs, even
> under brutal concurrency (e.g. scripts doing heavy writes from multiple
> shells at the same time).
>
> I know this is a dense patch with a lot to unpack. I've documented the
> locking and Mach VM interactions as thoroughly as possible in the code
> itself (roughly 1/3 of the lines are comments in ext2fs/journal.c), but I
> understand there is only so much nuance that can fit into C comments.
> If it would be helpful, I would be happy to draft a dedicated document
> detailing the journal's lifecycle, its hooks into libdiskfs/ext2, and the
> rationale behind the macro-transaction design, so future developers have a
> clear reference.
>
> Looking forward to your thoughts.
>
> Best,
> Milos
>
>
>
>
>
> On Tue, Mar 10, 2026 at 9:25 PM Milos Nikic <[email protected]>
> wrote:
>
>
> Hi Samuel,
>
> Just a quick heads-up: please hold off on reviewing this V3 series.
>
> While V3 version works fast for simple scenarios in single threaded
> situations (like configure or make ext2fs etc) I fund that while running
> some heavy multi-threaded stress tests on V3, a significant performance
> degradation happens due to lock contention bottleneck caused by the eager
> VFS memcpy hot-path. (memcpy inside journal_dirty_block which is called
> 1000s of time a second really becomes a performance problem.)
>
>  I have been working on a much cleaner approach that safely defers the
> block copying to the quiescent transaction stop state. It completely
> eliminates the VFS lock contention and brings the journaled performance
> back to vanilla ext2fs levels even with many threads competing at
> writing/reading/renaming in the same place.
>
> I am going to test this new architecture thoroughly over the next few days
> and will send it as V4 once I am certain it is rock solid.
>
> Thanks!
>
>
>
> On Mon, Mar 9, 2026 at 12:15 PM Milos Nikic <[email protected]>
> wrote:
>
>
> Hello Samuel and the Hurd team,
>
> I am sending over v3 of the journaling patch. I know v2 is still pending
> review, but while testing and profiling based on previous feedback, I
> realized the standard mapping wasn't scaling well for metadata-heavy
> workloads. I wanted to send this updated architecture your way to save you
> from spending time reviewing the obsolete v2 code.
>
> This version keeps the core JBD2 logic from v2 but introduces several
> structural optimizations, bug fixes, and code cleanups:
>     - Robin Hood Hash Map: Replaced ihash with a custom map for
> significantly tighter cache locality and faster lookups.
>     - O(1) Slab Allocator: Added a pre-allocated pool to make transaction
> buffers zero-allocation in the hot path.
>     -  Unified Buffer Tracking: Eliminated the dual linked-list/map
> structure in favor of just the map, fixing a synchronization bug from v2
> and simplifying the code.
>
>     - Few other small bug fixes
>     - Refactored Dirty Block Hooks: Moved the journal_dirty_block calls
> from inode.c directly into the ext2fs.h low-level block computation
> functions (record_global_poke, sync_global_ptr, record_indir_poke, and
> alloc_sync). This feels like a more natural fit and makes it much easier to
> ensure we aren't missing any call sites.
>
> Performance Benchmarks:
> I ran repeated tests on my machine to measure the overhead, comparing this
> v3 journal implementation against Vanilla Hurd.
> make ext2fs (CPU/Data bound - 5 runs):
>     Vanilla Hurd Average: ~2m 40.6s
>     Journal v3 Average: ~2m 41.3s
>     Result: Statistical tie. Journal overhead is practically zero.
>
> make clean && ../configure (Metadata bound - 5 runs):
>     Vanilla Hurd Average: ~3.90s (with latency spikes up to 4.29s)
>     Journal v3 Average: ~3.72s (rock-solid consistency, never breaking
> 3.9s)
>     Result: Journaled ext2 is actually faster and more predictable here
> due to the WAL absorbing random I/O.
>
> Crash Consistency Proof:
> Beyond performance, I wanted to demonstrate the actual crash recovery in
> action.
>     Boot Hurd, log in, create a directory (/home/loshmi/test-dir3).
>     Wait for the 5-second kjournald commit tick.
>     Hard crash the machine (kill -9 the QEMU process on the host).
>
> Inspecting from the Linux host before recovery shows the inode is
> completely busted (as expected):
>
> sudo debugfs -R "stat /home/loshmi/test-dir3" /dev/nbd0
>
> debugfs 1.47.3 (8-Jul-2025)
> Inode: 373911   Type: bad type    Mode:  0000   Flags: 0x0
> Generation: 0    Version: 0x00000000
> User:     0   Group:     0   Size: 0
> File ACL: 0 Translator: 0
> Links: 0   Blockcount: 0
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x00000000 -- Wed Dec 31 16:00:00 1969
> atime: 0x00000000 -- Wed Dec 31 16:00:00 1969
> mtime: 0x00000000 -- Wed Dec 31 16:00:00 1969
> BLOCKS:
>
> Note: On Vanilla Hurd, running fsck here would permanently lose the
> directory or potentially cause further damage depending on luck.
>
> Triggering the journal replay:
> sudo e2fsck -fy /dev/nbd0
>
> Inspecting immediately  after recovery:
>
> sudo debugfs -R "stat /home/loshmi/test-dir3" /dev/nbd0
>
> debugfs 1.47.3 (8-Jul-2025)
> Inode: 373911   Type: directory    Mode:  0775   Flags: 0x0
> Generation: 1773077012    Version: 0x00000000
> User:  1001   Group:  1001   Size: 4096
> File ACL: 0 Translator: 0
> Links: 2   Blockcount: 8
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x69af0213 -- Mon Mar  9 10:23:31 2026
> atime: 0x69af0213 -- Mon Mar  9 10:23:31 2026
> mtime: 0x69af0213 -- Mon Mar  9 10:23:31 2026
> BLOCKS:
> (0):1507738
> TOTAL: 1
>
> The journal successfully reconstructed the directory, and logdump confirms
> the transactions were consumed perfectly.
>
> I have run similar hard-crash tests for rename, chmod, and chown etc with
> the same successful recovery results.
>
> I've attached the v3 diff. Let me know what you think of the new hash map
> and slab allocator approach!
>
> Best,
> Milos
>
>
>
>
>
> On Fri, Mar 6, 2026 at 10:06 PM Milos Nikic <[email protected]>
> wrote:
>
>
> And here is the last one...
>
> I hacked up an improvement for journal_dirty_block to try and see if i
> could speed it up a bit.
> 1) Used specialized robin hood based hash table for speed (no tombstones
> etc) (I took it from one of my personal projects....just specialized it
> here a bit)
> 2) used a small slab allocator to avoid malloc-ing in the hot path
> 3) liberally sprinkled  __rdtsc() to get a sense of cycle time inside
> journal_dirty_block
>
> Got to say, just this simple local change managed to shave off 3-5% of
> slowness.
>
> So my test is:
> - Boot Hurd
> - Inside Hurd go to the Hurd build directory
> - run:
> $ make clean && ../configure
> $ time make ext2fs
>
> I do it multiple times for 3 different versions of ext2 libraries
> 1) Vanilla Hurd (No Journal): ~avg, 151 seconds
>
> 2) Enhanced JBD2 (Slab + Custom Hash): ~159 seconds (5% slower!)
>
> 3) Baseline JBD2 (malloc + libihash what was sent in V2): ~168 seconds
>
> Of course there is a lot of variability, and my laptop is not a perfect
> environment for these kinds of benchmarks, but this is what i have.
>
> My printouts on the screen show this:
> ext2fs: part:5:device:wd0: warning: === JBD2 STATS ===
> ext2fs: part:5:device:wd0: warning: Total Dirty Calls:      339105
> ext2fs: part:5:device:wd0: warning: Total Function:         217101909
> cycles
> ext2fs: part:5:device:wd0: warning: Total Lock Wait:        16741691 cycles
> ext2fs: part:5:device:wd0: warning: Total Alloc:            673363 cycles
> ext2fs: part:5:device:wd0: warning: Total Memcpy:           137938008
> cycles
> ext2fs: part:5:device:wd0: warning: Total Hash Add:         258533 cycles
> ext2fs: part:5:device:wd0: warning: Total Hash Find:        29501960 cycles
> ext2fs: part:5:device:wd0: warning: --- AVERAGES (Amortized per call) ---
> ext2fs: part:5:device:wd0: warning: Avg Function Time: 640 cycles
> ext2fs: part:5:device:wd0: warning: Avg Lock Wait:     49 cycles
> ext2fs: part:5:device:wd0: warning: Avg Memcpy:        406 cycles
> ext2fs: part:5:device:wd0: warning: Avg Malloc 1:      1 cycles
> ext2fs: part:5:device:wd0: warning: Avg Hash Add:      0 cycles
> ext2fs: part:5:device:wd0: warning: Avg Hash Find:      86 cycles
> ext2fs: part:5:device:wd0: warning: ==================
>
> Averages here say a lot...with these improvements we are now down to
> basically Memcpy time...and for copying 4096 bytes  of ram Im not sure we
> can make it take less than 400 cycles...so we are hitting hardware
> limitations.
> It would be great if we could avoid memcpy here altogether or delay it
> until commit or similar, and i have some ideas, but they all require
> drastic changes across libdiskfs and ext2fs, not sure if a few remaining
> percentage points of slowdown warrant that.
>
> Also, wow during ext2 compilation...this function (journal_dirty_block) is
> being called a bit more than 1000 times per second (for each and every
> block that is ever being touched by the compiler)
>
> I am attaching here the altered journal.c with these changes if one is
> interested in seeing the localized changes.
>
> Regards,
> Milos
>
> On Fri, Mar 6, 2026 at 11:09 AM Milos Nikic <[email protected]>
> wrote:
>
>
> Hi Samuel,
>
> One quick detail I forgot to mention regarding the performance analysis:
>
> The entire ~0.4s performance impact I measured is isolated exclusively to
> journal_dirty_block.
>
> To verify this, I ran an experiment where I stubbed out
> journal_dirty_block so it just returned immediately (which obviously makes
> for a very fast, but not very useful, journal!). With that single function
> bypassed, the filesystem performs identically to vanilla Hurd.
>
> This confirms that the background kjournald flusher, the transaction
> reference counting, and the checkpointing logic add absolutely no
> noticeable latency to the VFS. The overhead is strictly tied to the physics
> of the memory copying and hashmap lookups in that one block which we can
> improve in subsequent patches.
>
> Thanks, Milos
>
>
> On Fri, Mar 6, 2026 at 10:55 AM Milos Nikic <[email protected]>
> wrote:
>
>
> Hi Samuel,
>
> Thanks for reviewing my mental model on V1; I appreciate the detailed
> feedback.
>
> Attached is the v2 patch. Here is a breakdown of the architectural changes
> and refactors based on your review:
>
> 1. diskfs_node_update and the Pager
> Regarding the question, "Do we really want to update the node?": Yes, we
> must update it with every change. JBD2 works strictly at the physical block
> level, not the abstract node cache level. To capture a node change in the
> journal, the block content must be physically serialized to the transaction
> buffer. Currently, this path is diskfs_node_update -> diskfs_write_disknode
> -> journal_dirty_block.
> When wait is 0, this just copies the node details from the node-cache to
> the pager. It is strictly an in-memory serialization and is extremely fast.
> I have updated the documentation for diskfs_node_update to explicitly
> describe this behavior so future maintainers understand it isn't triggering
> synchronous disk I/O and doesn't measurably increase the latency of the
> file system.
> journal_dirty_block  is not one of the most hammered functions in
> libdiskfs/ext2 and more on that below.
>
> 2. Synchronous Wait & Factorization
> I completely agree with your factorization advice:
> write_disknode_journaled has been folded directly into
> diskfs_write_disknode, making it much cleaner.
> Regarding the wait flag: we are no longer ignoring it! Instead of blocking
> the VFS deeply in the stack, we now set an "IOU" flag on the transaction.
> This bubbles the sync requirement up to the outer RPC layer, which is the
> only place safe enough to actually sleep on the commit and thus maintain
> the POSIX sync requirement without deadlocking etc.
>
> 3. Multiple Writes to the Same Metadata Block
> "Can it happen that we write several times to the same metadata block?"
> Yes, multiple nodes can live in the same block. However, because the Mach
> pager always flushes the "latest snapshot" of the block, we don't have an
> issue with mixed or stale data hitting the disk.
> If RPCs hit while pager is actively writing that is all captured in the
> "RUNNING TRANSACTION". If it happens that that RUNNING TRANSACTION has the
> same blocks pager is committing RUNNING TRANSACTION will be forcebliy
> committed.
>
> 4. The New libdiskfs API
> I added two new opaque accessors to diskfs.h:
>
>     diskfs_journal_set_sync
>     diskfs_journal_needs_sync
>
>     This allows inner nested functions to declare a strict need for a
> POSIX sync without causing lock inversions. We only commit at the top RPC
> layer once the operation is fully complete and locks are dropped.
>
> 5. Cleanups & Ordering
>     Removed the redundant record_global_poke calls.
>     Reordered the pager write notification in journal.c to sit after the
> committing function, as the pager write happens after the journal commit.
>     Merged the ext2_journal checks inside diskfs_journal_start_transaction
> to return early.
>     Reverted the bold unlock moves.
>     Fixed the information leaks.
>     Elevated the deadlock/WAL bypass logs to ext2_warning.
>
> Performance:
> I investigated the ~0.4s (increase from 4.9s to 5.3s) regression on my SSD
> during a heavy Hurd ../configure test. By stubbing out journal_dirty_block,
> performance returned to vanilla Hurd speeds, isolating the overhead to that
> specific function.
>
> A nanosecond profile reveals the cost is evenly split across the mandatory
> physics of a block journal:
>
>     25%: Lock Contention (Global transaction serialization)
>
>     22%: Memcpy (Shadowing the 4KB blocks)
>
>     21%: Hash Find (hurd_ihash lookups for block deduplication)
>
> I was surprised to see hurd_ihash taking up nearly a quarter of the
> overhead. I added some collision mitigation, but left a further
> improvements of this patch to keep the scope tight. In the future, we could
> drop the malloc entirely using a slab allocator and optimize the hashmap to
> get this overhead closer to zero (along with introducing a "frozen data"
> concept like Linux does but that would be a bigger non localized change).
>
> Final Note on Lock Hierarchy
> The intended, deadlock-free use of the journal in libdiskfs is best
> illustrated by the CHANGE_NODE_FIELD macro in libdiskfs/priv.h
>   txn = diskfs_journal_start_transaction ();
>   pthread_mutex_lock (&np->lock);
>   (OPERATION);
>   diskfs_node_update (np, diskfs_synchronous);
>   pthread_mutex_unlock (&np->lock);
>   if (diskfs_synchronous || diskfs_journal_needs_sync (txn))
>     diskfs_journal_commit_transaction (txn);
>   else
>     diskfs_journal_stop_transaction (txn);
>
> By keeping journal operations strictly outside of the node
> locking/unlocking phases, we treat it as the outermost "lock" on the file
> system, mathematically preventing deadlocks.
>
> Kind regards,
> Milos
>
>
>
> On Thu, Mar 5, 2026 at 12:41 PM Samuel Thibault <[email protected]>
> wrote:
>
>
> Hello,
>
> Milos Nikic, le jeu. 05 mars 2026 09:31:26 -0800, a ecrit:
>
> Hurd VFS works in 3 layers:
>
>  1. Node cache layer: The abstract node lives here and it is the ground
> truth
>     of a running file system. When one does a stat myfile.txt, we get the
>     information straight from the cache. When we create a new file, it gets
>     placed in the cache, etc.
>
>  2. Pager layer: This is where nodes are serialized into the actual
> physical
>     representation (4KB blocks) that will later be written to disk.
>
>  3. Hard drive: The physical storage that receives the bytes from the
> pager.
>
> During normal operations (not a sync mount, fsync, etc.), the VFS operates
> almost entirely on Layer 1: The Node cache layer. This is why it's super
> fast.
> User changed atime? No problem. It just fetches a node from the node cache
> (hash table lookup, amortized to O(1)) and updates the struct in memory.
> And
> that is it.
>
>
> Yes, so that we get as efficient as possible.
>
> Only when the sync interval hits (every 30 seconds by default) does the
> Node
> cache get iterated and serialized to the pager layer
> (diskfs_sync_everything ->
>  write_all_disknodes -> write_node -> pager_sync). So basically, at that
> moment, we create a snapshot of the state of the node cache and place it
> onto
> the pager(s).
>
>
> It's not exactly a snapshot because the coherency between inodes and
> data is not completely enforced (we write all disknodes before asking
> the kernel to write back dirty pages, and then poke the writes).
>
> Even then, pager_sync is called with wait = 0. It is handed to the pager,
> which
> sends it to Mach. At some later time (seconds or so later), Mach sends it
> back
> to the ext2 pager, which finally issues store_write to write it to Layer 3
> (The
> Hard drive). And even that depends on how the driver reorders or delays it.
>
> The effect of this architecture is that when store_write is finally
> called, the
> absolute latest version of the node cache snapshot is what gets written to
> the
> storage. Is this basically correct?
>
>
> It seems to be so indeed.
>
> Are there any edge cases or mechanics that are wrong in this model
> that would make us receive a "stale" node cache snapshot?
>
>
> Well, it can be "stale" if another RPC hasn't called
> diskfs_node_update() yet, but that's what "safe" FS are all about: not
> actually provide more than coherency of the content on the disk so fsck
> is not suppposed to be needed. Then, if a program really wants coherency
> between some files etc. it has to issue sync calls, dpkg does it for
> instance.
>
> Samuel
>
>
>

Re: [PATCH 1/3] ext2fs: Add JBD2 on-disk layout headers

Reply via email to