Stephan Beal wrote:
>> Fossil tries to keep the storage of content a separate matter from the
>> content itself, but doesn't fully succeed.
>
> Yes and no. It uses sqlite (the storage) to do a lot of the work, but the
> manifests contain all the raw metadata needed _except_ for the mapping of
> UUID to blob content (which is what the blob table does).

I don't understand why you say ‟yes and no” here; are you saying Fossil fails 
to fully succeed in keeping storage and content matters separate just because 
the blob table maps UUID to content? This part isn't where Fossil fails at all.

>> The manifest is an essential part of the content, since it contains
>> filenames, which are themselves content. However, some manifests also
>> contain a B-card for delta compression, which is a storage matter.
>
> _Delta_ compression is a syntax extension for manifests only, and have no
> effect on the system outside of how their manifests are produced/parsed, so
> i would say B-cards are very much part of the metadata.
>
> _Content_ compression is indeed a storage issue, and is not mentioned
> anywhere in the manifests.

There's content that consists of the content of regular files. And there's 
content that consists of directory entries, including filenames. Filenames are 
metadata for files, but that metadata is still part of the content.

The distinction I'm making between content and storage matters is that if you 
copy stuff from one filesystem to another, then inode numbers, block numbers, 
and pointers internal to the filesystem all change, and the user doesn't care, 
but the content (including file content and filenames) does not change, and the 
user would care if it did. The content consists of all and only the stuff that 
must not change.

In the case of Fossil, manifests are the canonical structure for recording 
filenames. If you change that part of a manifest, the user would care. So you 
must not change it. This is why manifests are part of the content.

However, if you were to remove a B-card from a manifest and insert/change all 
the other cards necessary to compensate (usually making the manifest much 
bigger), or insert a B-card into a manifest and delete/change all the other 
cards necessary to compensate (usually making the manifest much smaller), the 
user would not care, because the content would remain the same, _if_ other 
parts of the manifest weren't part of the content. Using delta compression or 
not is not something that affects the content, just like using zlib compression 
or not is not something that affects the content; it's a storage issue, like 
inode numbers are.

However, Fossil puts both filenames (which are content), and the indicator of 
whether delta compression is used, in the same structure, and changes that 
structure depending on whether delta compression is used, and the hash of that 
full structure is used as a content-UUID in future content. This is the 
problem, just like changing a blob's UUID depending on whether the blob is 
gzip-compressed would be a problem.

If you store a file uncompressed, or gzip-compressed, or lz4-compressed, or 
fragment the file in the filesystem, or change the inode number, or store the 
file in a blob table in a Sqlite DB, the hash of the file remains unchanged. 
You can gzip-compress or uncompress old files without having to recompute any 
hashes. But if you store a manifest (which is content, just like a file is) 
uncompressed vs. delta-compressed, the hash of the manifest _does_ change. This 
means you can't go back and delta-compress or un-compress old manifests without 
recomputing all the later hashes of everything.

Because manifests are part of the content, any kind of compression of them 
(including delta-compression) _is_ content compression.

Fully separating content and storage matters requires that the hash of the 
content not change, regardless of how you store the content. Fossil's manifest 
structure fails to achieve this separation. ZFS's data structures also fail to 
achieve this separation. OTOH, Git's data structures do achieve this separation.

You could eliminate the B-card dependency by having the UUID of the manifest be 
the hash of the manifest if the manifest has no B-card (which is how you do it 
now), but be the hash of the uncompressed version of the manifest if the 
manifest does have a B-card. For efficiency, you could still store the hash of 
the compressed version, but just as part of an index, and that hash wouldn't be 
the manifest's UUID. Then you could delta-compress or uncompress without 
affecting future records, just like you can gzip-compress or uncompress now.

Which nobody cares about, which is why I said it's peanut gallerizing.

>> The R-card looks redundant. What's it for?
>
> Very short answer: largely historical to prove data fidelity.

Meaning that proving data fidelity no longer requires it?

> It has in the
> mean time been made optional, but fossil still calculates it by default
> (and it's a very expensive calculation).

Is this why NetBSD was having trouble with Fossil? I.e. if they disable this 
option, it will solve their problem?
_______________________________________________
fossil-users mailing list
[email protected]
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to