On Aug 11, 2017, at 7:10 AM, Damien Sykes-Lindley <dam...@dcpendleton.plus.com> 
wrote:
> 
> I couldn't help noticing there seemed to be a silence on speed comparisons.

There have been many threads on this over the years.  Just for a start, search 
the list archives for “NetBSD”. 

> After cloning and working with several publicised Fossil repositories, I 
> can't help but notice that the majority of them are rather small.

Yes, that’s best practice for most any DVCS.  Even the Linux project takes this 
philosophy:

    http://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html

That is, the single monolithic repo you see when cloning “the Linux Git repo” 
is something of an illusion, which the developers of Linux don’t actually deal 
with very much.

> Most of the projects that I am involved with are games...Of course these will 
> contain binary files

That “of course” needn’t be a foregone conclusion.  

Many asset formats are available in text forms, which are friendly for use in 
version control systems.  For example, you may be able to store 3D models in 
the repository in COLLADA format and some 2D assets in SVG.

For the bitmapped textures, it’s better to store those as uncompressed bitmap 
formats, then compress them during the build process to whatever format you’ll 
use within the game engine and for distribution.

A 1-pixel change to a Windows BMP file causes a much smaller change to the size 
of a Fossil repository than does a 1-pixel change to a JPEG or PNG, because 
that 1 pixel difference can throw off the whole rest of the compression 
algorithm, causing much of the rest of the file to change.

This can be tricky to manage.  You might think TIFF is a good file format for 
this purpose, but you’re forgetting all the metadata in it that changes simply 
when a file is opened and re-saved.  (Timestamps, GUIDs, etc.)  It’s better to 
go with a bare “box of pixels” format like Windows BMP.

All of this does make the checkout size bigger, but Fossil’s delta compression 
has two positive consequences here:

1. The Fossil repository size will probably be as small or even smaller.  A 
zlib-compressed Windows BMP file is going to be about the same size as a PNG 
file with the same content.

2. If those files are changed multiple times between initial creation and 
product ship time, the delta compression will do a far better job if the input 
data isn’t already compressed.  This is how you get the high compression ratios 
you see on most Fossil repositories by visiting their /stat page.  My biggest 
repository is rocking along at 39:1 compression ratio, and it hasn’t been 
rebuilt and recompressed lately.

> (generally an executable

Why would you include generated files in a version control repository?

Fossil is not a networked file system.  If you try to treat it like one, it 
will take its revenge on you.

> dependency libraries

In source code form only, perhaps.

Even then, it’s better to hold those in separate repositories.

It would be nice if Fossil had a sub-modules feature like Git to help with 
this, so that opening the main repository also caused sub-Fossils to be cloned 
and opened in subdirectories.  Meanwhile, you have to do manual “fossil open 
--nested” commands, but it’s a one-time hassle.

Nested checkins would also be nice.  That is, if a file changes in a nested 
checkout, a “fossil ci” from the top level should offer to check in the changes 
on the sub-project.

> Also note that all commits were tests only and so weren't synced to remotes. 
> Naturally this means that commits are even slower when syncing.

It also means that local differences are a smaller percentage of the total time 
taken for many operations, since the time may be swamped by network I/O.

For instance, I notice in your tests that you seem to be comparing “fossil ci” 
to “git commit”, where the fair test would be against “git commit -a && git 
push”.

> 1. Git seems to do better at compressing and opening smaller repositories, 
> while Fossil triumphs over larger ones.

Be careful with such comparisons.

Fossil repositories aren’t kept optimally small, since that would increase the 
time for checkins and such.  Every now and then, even after an initial import, 
you want to look into “fossil rebuild” and some of its more advanced options.

This is what I was getting at about with my comments about the 39:1 compression 
ratio I’m currently seeing on my largest Fossil repository.  I expect I could 
make it smaller, if I did such a rebuild.

I have no idea if Git has some similar “rebuild” feature, though I will 
speculate that the per-file filesystem overheads will eat away at a lot of any 
advantages Git has.  Be sure you’re calculating size-on-disk, not the total 
size of the files alone.  That is, a 1 byte file on a filesystem with a 4K 
block size takes 4K plus a directory entry, not 1 byte.

Fossil, by keeping all artifacts in a single file, does not have this overhead. 
 The “rebuild” problem is probably its closest analog.

> 3. The speed of a commit in Git seems to be dependent on the size of the 
> change. The bigger the changes, naturally the slower the commits. Commits in 
> Fossil seem, with a few discrepancies (notably commit times in repos 1 and 
> 2), to be dependent on the size of the repository.

That’s probably due to the repo-cksum setting, which defaults to “on,” and 
which has no equivalent in Git.  You’ll probably gain a lot of speed by turning 
that off:

    https://www.fossil-scm.org/index.html/help?cmd=settings

> I was very interested to find an article hidden in the depths of the Fossil 
> website 
> (http://fossil-scm.org/index.html/event?name=be8f2f3447ef2ea3344f8058b6733aa08c08336f)

That’s a summary of the NetBSD threads I referred to above.

> are there any plans to optimise Fossil in the future?

My sense is that it depends on people scratching their own itches.  The SQLite, 
Fossil, and Tcl projects don’t need Fossil to be faster, so it’s fine for now.  
If someone wants to come along and make Fossil support huge repositories, I’m 
sure the patches would be thoughtfully reviewed and possibly accepted.

One option not covered in the tech note you found is the possibility of narrow 
and shallow clones:

1. Narrow: The local clone doesn’t contain the history of all assets in the 
remote repository (e.g. just one subdirectory)

2. Shallow: The local clone contains only the most recent history of all assets 
in the remote repository.  With depth=1, you get the effect of old-style VCSes 
like Subversion, except that you have the option to build up more history in 
the local repository as time goes on.

Both would help significantly, but no one has stepped up to do either yet.
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to