On Thu, Jan 8, 2015 at 7:34 PM, Warren Young <w...@etr-usa.com> wrote:
> Fossil does an O(N) scan over the entire DB as an extra integrity check, > on the assumption that the filesystem may not be reliable. > The N potentially has another multiplier factor depending on how many deltas it has to weed through to get to the version in question of each file. But yeah, i guess we could say it's effectively O(N) (and conceptually it certainly is). > (It’s a good assumption unless you’ve taken some uncommon steps to ensure > that it *is* reliable. See “Disks from the Perspective of a File System,” > by Marshall Kirk McKusick in ACM Queue: http://goo.gl/hHvdQ8) > i think it's fair to say that Richard's fairly well-versed in the topic of disk reliability ;). > > What is the limiting factor? > > The balance between your patience and your disk’s I/O throughput. > Pretty much! Even the largest repos will _eventually_ finish, provided they don't overstep sqlite/system limits. Repos with large blobs might fail on systems with very constrained memory/virtual memory. e.g. if you have a 2GB blob in your DB, Fossil needs (at times) to allocate more than twice that (e.g. when diffing versions, as the delta generation algorithm requires all content to be in memory). A Raspberry Pi without enough swap space (virtual memory) could easily choke on that (and would take ages swapping out to SD Card). This problem has historically been ignored since SQLite’s repo was viewed > as “large” at ~50 MiB. > Interestingly, i've had to face philosophical questions in libfossil in this regard. My instinct tells me to (like fossil) calculate the R-card by default, but the optimizer and memory allocator in me screams out "no!" So there's a toggle, but the "final" default is as yet undecided. > > Is there a path to improve this performance similar to the SQLite > speed gains in the last 2 years? > > The SQLite improvements improve Fossil’s speed, too. > > I wouldn’t recommend turning off repo-cksum unless you are storing your > fossils on uncommonly-durable storage: > > 1. Battery-backed hardware RAID; or > > 2. A filesystem that does data checksumming itself, like ZFS, so that > Fossil’s data checksumming is redundant. > For "small" repos (which covers the vast majority of single-user/small-team repos), there's little reason to disable it. It was (AFAIK) never recognized as a "problem" until Fossil was pushed into service in "very large" repos. Perhaps Richard recognized the potential for this during the initial, but had a backwards-compatible long-term fix should it ever pose a problem: simply make it (the R-card) an optional part of the manifest. Whether that was accident or foresight, i can't say, but based on my experience with Richard i'd bet it was foresight. -- ----- stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users