Stephan Beal wrote:
> Fossil separates the names and "body" of a file, and any number of files
> may refer to the same blob body. The zlib-level compression is part of the
> implementation, not explicitly accounted for anywhere in the model.
[snip]
> Manifests are themselves stored as blobs, so they may _also_
> undergo app-level (zlib) compression.

Yes, I realize all that, and it's all good design. That's not the part I'm 
throwing peanuts at.

>> In the case of Fossil, manifests are the canonical structure for recording
>> filenames. If you change that part of a manifest, the user would care. So
>> you must not change it. This is why manifests are part of the content.
>
> You _cannot_ change it, so i'm not sure what your point is there. All
> fossil data is immutable.

You _can_ change it, but if you do so, then you break things, which of course 
is what you mean by ‟you cannot change it” (i.e. we're just wording the same 
thing differently). My point here was that it being something you ‟cannot” 
(i.e. must not) change is what makes it be part of the content. This was to 
support my later (main) point that Fossil conflates content and storage matters 
by putting the B-card (a storage matter, which is something that should be 
changeable without breaking things) in something that must not change.

> To be fair: B-cards (manifest/delta compression) were added long
> after the initial manifest design, and had to be done in a compatible
> manner.

It isn't forward compatible (old code can't handle manifests that have 
B-cards), so the only thing you achieved is backward compatibility. For the 
latter, there was no need to put B-cards into manifests. Ironically, keeping 
the B-cards out of manifests, and putting them in a separate place (where they 
belong), would have enabled forward compatibility.

Instead of manifests with B-cards, create a new type of artifact, a 
thingy-with-B-card, that is not a manifest, but has the same format that you 
chose for your manifest-with-B-card. Store this artifact the same as any other 
artifact. For a manifest M, use the same UUID that you would have used if there 
were no such thing as B-cards, but then in the blob table record R for that 
UUID, instead of recording M, record an instruction to interpret the specified 
thingy-with-B-card artifact to generate M. Optionally cache a copy of that 
thingy in a field of R for speed (to avoid a second lookup), or optionally even 
cache a copy of M itself if you can afford the space.

Then manifest UUIDs don't depend on whether you use B-cards or not. When 
syncing with clients that understand B-cards, send them the thingies instead of 
manifests, to minimize network traffic. When syncing with clients that don't 
understand B-cards, send them the manifests (and when you get manifests, 
optionally replace them by thingies to save storage space). Either way, you use 
the same UUID.

Fossil's mistake here was in treating the thingy as a manifest, rather than as 
a different type of artifact.

>> Which nobody cares about, which is why I said it's peanut gallerizing.
>
> Fair enough. "Nobody cares" is too strong, but the extra effort of going
> back and "fixing" that now doesn't seem justifiable.

I didn't mean to imply that anybody should care. I meant it in the sense of 
‟nobody cares how many licks it takes to get to the center of a lollipop”.

>> >> The R-card looks redundant. What's it for?
>> >
>> > Very short answer: largely historical to prove data fidelity.
>>
>> Meaning that proving data fidelity no longer requires it?
>
> The hash proves (before each commit) that what fossil just wrote to the db
> is what will be read back when the data is retrieved (if the check fails,
> fossil bails). It is "historical" in that it adds a layer of protection
> which has never (AFAIK) proven to be absolutely necessary (but is still a
> good security net when working on new fossil code).

But that extra layer of protection doesn't appear to require the R-card. That's 
why I said it looks redundant. Even if you want that security net, you can 
still get rid of the R-card, and Fossil's data format will still have the 
necessary hashes to provide that security net.

Specifically:
0. The hash of every manifest M (which can be used to verify M) is stored in 
every other manifest that mentions M, because the hash itself is used as the 
name by which M is mentioned. In general, anywhere you mention a manifest, you 
use its hash to do so.
1. Likewise, the hash of every file F (which can be used to verify F) is stored 
in every manifest that mentions F.

IIUC, both #0 and #1 were true even when Fossil was first created. Yet #0 and 
#1 combined make the R-card redundant. So it appears that the R-card isn't just 
redundant now, but has always been redundant.

#0 alone make the Z-card redundant.

You might then ask, what if you never mention a particular manifest anywhere? 
Well if none of your data ever mentions it, then obviously your data doesn't 
depend on it. Which means you don't care if it's correct. In fact, you might as 
well delete it. Fossil is designed such that it _does_ always mention all your 
historical manifests (if it doesn't, that's a Git-style design flaw), so you 
always have a way to verify them.

And if you want to quickly verify all your manifests without having to walk 
your history graph to get all the hashes, remember you already have a 
convenient cache of the hashes: they're stored as UUIDs in the blob table.

> Delta manifests only shrinks the manifests (saving
> notable amounts of space for the NetBSD repo), but makes processing them
> more expensive in many cases (e.g. a manifest with a parent may (depending
> on context) require loading the parent manifest). i don't recall off-hand
> exactly which algos (other than checkin) are (computationally speaking)
> most effected by B-card compression.

Well then, simply cache a decompressed (i.e. not B-card-delta-compressed) copy 
of each manifest (or at least, of some strategically chosen ones), which 
enables a space-time tradeoff. The NetBSD people would probably be happy to 
spend a few GB of disk space on that cache in exchange for Fossil being fast. 
As with Fossil's other caches and indexes, this would be built locally, not 
transferred over the network when syncing.
_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to