Re: [9fans] Fossil reverse-engineering

Noam Preil Fri, 10 Apr 2026 23:56:21 -0700

Point!

Question 1: anyone know what's going on with the nlock counter in
Blocks? The purpose is explained well enough, but the reason it's
necessary, I've yet to find. Doesn't mean it's not there, there's a lot
of comments that I've overlooked until I was looking for them or
grepping specifically, but.


It's clear that there can be multiple references to the same Block that
are all held by the same thread, and so we can't actually take multiple
locks. I've not yet divined why this is possible.


Question 2: the rollback mechanism. Why? Why does it exist.

The flush thread's logic is pretty clear:

- Scan the cache
- Find the list of all dirty blocks
- Sort them into the queue in order of ascending block address
- Loop through the queue, attempting to flush
        - In case of a failure (e.g. if a lock can't be acquired), it forms a 
new queue of all failed blocks
        - It tries flushing that queue after it finishes with the current one
- Once the queue has been emptied, it starts again

What weirds me out is the dependency handling. If block b0 depends on
b1, and we scan through in order, and we try to flush b0 before its
dependency, we synthetize a version of b0 that points to an outdated
version of b1 and flush that.

But this leaves the block dirty, which means that it doesn't even remove
it from the queue. This doesn't meaningfully help performance, because
we're just writing a block that we flat out don't need to. If we had
failed to lock b0, because someone was reading it at the time, we'd
simply move on in the queue, write b1 first, and then write b0
afterwards on the next pass.

Is there any reason we don't just _always_ skip dependents, on purpose,
until the dependencies have been flushed?

This would also allow simplifying the logic a bit; there's not really
any reason we need to form the queue of blocks we failed to flush,
because we can just grab them on the next scan anyways. There should
not, as far as I can tell, be any case where we'd ever try reusing a
block before it's flushed anyways, so we shouldn't actually need to do
this ordering.

It seems like a lot of extra code, which I'm fairly certain has at least
another remaining bug (I've regularly encountered a case where, given a
file, if power is lost while _updating_ the file, fossil ends up losing
both the new _and the old copies_, which I'm fairly certain is due to an
edge case in this logic; the old block exists and has not been reused,
but the parent block pointing to it no longer does after reboot.)

and there's no real performance win as far as I can tell either.

So, am I missing something?


And, from a design perspective: why are fossil-local snapshots even a
thing? The whole _point_ of fossil is to be used with venti. Fossil
snapshots are inherently inferior to venti archives, and their existence
makes the code a lot more complicated than it needs to be; if we only
had venti archives and local data, we'd only have to copy on write _to
venti blocks_, not to local ones, which makes a lot of the local block
management code a lot simpler and easier to get correct.

Given the decisions that went into venti's design to make it hard to get
the implementation wrong, I find it odd to see what seems to be a
redundant feature that's hard to get _right_ in the file system made
specifically for it.


- Noam Preil


------------------------------------------
9fans: 9fans
Permalink: 
https://9fans.topicbox.com/groups/9fans/Te0d3818e550641af-M8e290fe42e990edcbf7b724c
Delivery options: https://9fans.topicbox.com/groups/9fans/subscription

Re: [9fans] Fossil reverse-engineering

Reply via email to