On Thu, Jan 8, 2015 at 7:34 PM, Warren Young <w...@etr-usa.com> wrote:

> Fossil does an O(N) scan over the entire DB as an extra integrity check,
> on the assumption that the filesystem may not be reliable.
>

The N potentially has another multiplier factor depending on how many
deltas it has to weed through to get to the version in question of each
file. But yeah, i guess we could say it's effectively O(N) (and
conceptually it certainly is).


> (It’s a good assumption unless you’ve taken some uncommon steps to ensure
> that it *is* reliable.  See “Disks from the Perspective of a File System,”
> by Marshall Kirk McKusick in ACM Queue: http://goo.gl/hHvdQ8)
>

i think it's fair to say that Richard's fairly well-versed in the topic of
disk reliability ;).


> >    What is the limiting factor?
>
> The balance between your patience and your disk’s I/O throughput.
>

Pretty much! Even the largest repos will _eventually_ finish, provided they
don't overstep sqlite/system limits. Repos with large blobs might fail on
systems with very constrained memory/virtual memory. e.g. if you have a 2GB
blob in your DB, Fossil needs (at times) to allocate more than twice that
(e.g. when diffing versions, as the delta generation algorithm requires all
content to be in memory). A Raspberry Pi without enough swap space (virtual
memory) could easily choke on that (and would take ages swapping out to SD
Card).

This problem has historically been ignored since SQLite’s repo was viewed
> as “large” at ~50 MiB.
>

Interestingly, i've had to face philosophical questions in libfossil in
this regard. My instinct tells me to (like fossil) calculate the R-card by
default, but the optimizer and memory allocator in me screams out "no!" So
there's a toggle, but the "final" default is as yet undecided.


> >    Is there a path to improve this performance similar to the SQLite
> speed gains in the last 2 years?
>
> The SQLite improvements improve Fossil’s speed, too.
>
> I wouldn’t recommend turning off repo-cksum unless you are storing your
> fossils on uncommonly-durable storage:
>
> 1. Battery-backed hardware RAID; or
>
> 2. A filesystem that does data checksumming itself, like ZFS, so that
> Fossil’s data checksumming is redundant.
>

For "small" repos (which covers the vast majority of single-user/small-team
repos), there's little reason to disable it. It was (AFAIK) never
recognized as a "problem" until Fossil was pushed into service in "very
large" repos. Perhaps Richard recognized the potential for this during the
initial, but had a backwards-compatible long-term fix should it ever pose a
problem: simply make it (the R-card) an optional part of the manifest.
Whether that was accident or foresight, i can't say, but based on my
experience with Richard i'd bet it was foresight.


-- 
----- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to