On Mon, Jan 12, 2015 at 12:22 PM, Kelly Dean <[email protected]> wrote:
> I don't understand why you say ‟yes and no” here; are you saying Fossil > fails to fully succeed in keeping storage and content matters separate just > because the blob table maps UUID to content? This part isn't where Fossil > fails at all. > Fossil's _model_ keeps them separate, but for practical purposes a database is required for storage. It would be exceedingly difficult to implement all the same features of this model using a non-relational system. > There's content that consists of the content of regular files. And there's > content that consists of directory entries, including filenames. Filenames > are metadata for files, but that metadata is still part of the content. > Fossil separates the names and "body" of a file, and any number of files may refer to the same blob body. The zlib-level compression is part of the implementation, not explicitly accounted for anywhere in the model. The compression of file lists (part of the B-card stuff) is a manifest-specific syntax. Manifests are themselves stored as blobs, so they may _also_ undergo app-level (zlib) compression. > The distinction I'm making between content and storage matters is that if > you copy stuff from one filesystem to another, then inode numbers, block > numbers, and pointers internal to the filesystem all change, and the user > doesn't care, but the content (including file content and filenames) does > not change, and the user would care if it did. The content consists of all > and only the stuff that must not change. > i'm not sure where you're going with that. Copying a fossil repo has no effect on the content, nor would (hypothetically) converting content from the current fossil impl to (e.g.) a second-generation impl with a new storage layer. It cannot simply be _copied_ between SCM systems because of (e.g.) the zlib compression and other internal details, but it could be _exported_ with 100% fidelity. In the case of Fossil, manifests are the canonical structure for recording > filenames. If you change that part of a manifest, the user would care. So > you must not change it. This is why manifests are part of the content. > You _cannot_ change it, so i'm not sure what your point is there. All fossil data is immutable. > However, if you were to remove a B-card from a manifest and insert/change > all the other cards necessary to compensate (usually making the manifest much bigger), or insert a B-card into a > manifest and delete/change all the other cards necessary to compensate > (usually making the manifest much smaller), the user would not care, > because the content would remain the same, _if_ other parts of the manifest > weren't part of the content. Using delta compression or not is not > something that affects the content, just like using zlib compression or not > is not something that affects the content; it's a storage issue, like inode > numbers are. > (Addressed further below.) > However, Fossil puts both filenames (which are content), and the indicator > of whether delta compression is used, in the same structure, and changes > that structure depending on whether delta compression is used, and the hash > of that full structure is used as a content-UUID in future content. This is > the problem, just like changing a blob's UUID depending on whether the blob > is gzip-compressed would be a problem. > Correct. To be fair: B-cards (manifest/delta compression) were added long after the initial manifest design, and had to be done in a compatible manner. > just like a file is) uncompressed vs. delta-compressed, the hash of the > manifest _does_ change. This means you can't go back and delta-compress or > un-compress old manifests without recomputing all the later hashes of > everything. > Correct. > Because manifests are part of the content, any kind of compression of them > (including delta-compression) _is_ content compression. > Okay, i can now agree with that. > Fully separating content and storage matters requires that the hash of the > content not change, regardless of how you store the content. Fossil's > manifest structure fails to achieve this separation. ZFS's data structures > also fail to achieve this separation. OTOH, Git's data structures do > achieve this separation. > i'm not going down that road ;). > Which nobody cares about, which is why I said it's peanut gallerizing. > Fair enough. "Nobody cares" is too strong, but the extra effort of going back and "fixing" that now doesn't seem justifiable. It is an interesting "problem," though. > >> The R-card looks redundant. What's it for? > > > > Very short answer: largely historical to prove data fidelity. > > Meaning that proving data fidelity no longer requires it? > The hash proves (before each commit) that what fossil just wrote to the db is what will be read back when the data is retrieved (if the check fails, fossil bails). It is "historical" in that it adds a layer of protection which has never (AFAIK) proven to be absolutely necessary (but is still a good security net when working on new fossil code). > It has in the > > mean time been made optional, but fossil still calculates it by default > > (and it's a very expensive calculation). > > Is this why NetBSD was having trouble with Fossil? I.e. if they disable > this option, it will solve their problem? > Part of it, but not all. Delta manifests only shrinks the manifests (saving notable amounts of space for the NetBSD repo), but makes processing them more expensive in many cases (e.g. a manifest with a parent may (depending on context) require loading the parent manifest). i don't recall off-hand exactly which algos (other than checkin) are (computationally speaking) most effected by B-card compression. -- ----- stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________ fossil-users mailing list [email protected] http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

