Stephan Beal wrote: >> Fossil tries to keep the storage of content a separate matter from the >> content itself, but doesn't fully succeed. > > Yes and no. It uses sqlite (the storage) to do a lot of the work, but the > manifests contain all the raw metadata needed _except_ for the mapping of > UUID to blob content (which is what the blob table does).
I don't understand why you say ‟yes and no” here; are you saying Fossil fails to fully succeed in keeping storage and content matters separate just because the blob table maps UUID to content? This part isn't where Fossil fails at all. >> The manifest is an essential part of the content, since it contains >> filenames, which are themselves content. However, some manifests also >> contain a B-card for delta compression, which is a storage matter. > > _Delta_ compression is a syntax extension for manifests only, and have no > effect on the system outside of how their manifests are produced/parsed, so > i would say B-cards are very much part of the metadata. > > _Content_ compression is indeed a storage issue, and is not mentioned > anywhere in the manifests. There's content that consists of the content of regular files. And there's content that consists of directory entries, including filenames. Filenames are metadata for files, but that metadata is still part of the content. The distinction I'm making between content and storage matters is that if you copy stuff from one filesystem to another, then inode numbers, block numbers, and pointers internal to the filesystem all change, and the user doesn't care, but the content (including file content and filenames) does not change, and the user would care if it did. The content consists of all and only the stuff that must not change. In the case of Fossil, manifests are the canonical structure for recording filenames. If you change that part of a manifest, the user would care. So you must not change it. This is why manifests are part of the content. However, if you were to remove a B-card from a manifest and insert/change all the other cards necessary to compensate (usually making the manifest much bigger), or insert a B-card into a manifest and delete/change all the other cards necessary to compensate (usually making the manifest much smaller), the user would not care, because the content would remain the same, _if_ other parts of the manifest weren't part of the content. Using delta compression or not is not something that affects the content, just like using zlib compression or not is not something that affects the content; it's a storage issue, like inode numbers are. However, Fossil puts both filenames (which are content), and the indicator of whether delta compression is used, in the same structure, and changes that structure depending on whether delta compression is used, and the hash of that full structure is used as a content-UUID in future content. This is the problem, just like changing a blob's UUID depending on whether the blob is gzip-compressed would be a problem. If you store a file uncompressed, or gzip-compressed, or lz4-compressed, or fragment the file in the filesystem, or change the inode number, or store the file in a blob table in a Sqlite DB, the hash of the file remains unchanged. You can gzip-compress or uncompress old files without having to recompute any hashes. But if you store a manifest (which is content, just like a file is) uncompressed vs. delta-compressed, the hash of the manifest _does_ change. This means you can't go back and delta-compress or un-compress old manifests without recomputing all the later hashes of everything. Because manifests are part of the content, any kind of compression of them (including delta-compression) _is_ content compression. Fully separating content and storage matters requires that the hash of the content not change, regardless of how you store the content. Fossil's manifest structure fails to achieve this separation. ZFS's data structures also fail to achieve this separation. OTOH, Git's data structures do achieve this separation. You could eliminate the B-card dependency by having the UUID of the manifest be the hash of the manifest if the manifest has no B-card (which is how you do it now), but be the hash of the uncompressed version of the manifest if the manifest does have a B-card. For efficiency, you could still store the hash of the compressed version, but just as part of an index, and that hash wouldn't be the manifest's UUID. Then you could delta-compress or uncompress without affecting future records, just like you can gzip-compress or uncompress now. Which nobody cares about, which is why I said it's peanut gallerizing. >> The R-card looks redundant. What's it for? > > Very short answer: largely historical to prove data fidelity. Meaning that proving data fidelity no longer requires it? > It has in the > mean time been made optional, but fossil still calculates it by default > (and it's a very expensive calculation). Is this why NetBSD was having trouble with Fossil? I.e. if they disable this option, it will solve their problem? _______________________________________________ fossil-users mailing list [email protected] http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

