Re: [fossil-users] Fossil performance and optimisation

2017-08-17 Thread Eduardo Morras
On Fri, 11 Aug 2017 14:10:32 +0100
"Damien Sykes-Lindley"  wrote:

> Hi,
> I was introduced to Fossil about three years ago. It is the first VCS
> I have actually used extensively for my own work and as such I
> haven't really had much to compare it with. However, now I'm in the
> realm of contributing code I am having to learn how to make use of
> other systems as well. For the purposes of my experiments, I have
> been comparing Fossil alongside Git, as this seems to be the popular
> alternative and there is a comparison sheet describing many areas of
> difference between Fossil and Git. For the most part, Fossil has many
> advantages over Git. Though I couldn't help noticing there seemed to
> be a silence on speed comparisons. There is a separate article on
> performance statistics that again doesn't even mention speed.

> Are there any configurations I may have
> missed that may help to optimise Fossil in these areas? If not, are
> there any plans to optimise Fossil in the future? Cheers. Damien.

For big repositories, you should increase the sqlite cache. Don't know
if today fossil 2.3 honors the deprecated default_cache_size
pragma/header option, but on Admin tab you can run pragma
default_cache_size=2147483648 to set it to 2GB always (afai understand
it doesn't use negative values to set cache size in number of pages). 

I used to "hack" fossil source code to always use big caches for import
git or svn repositories.

Don't know if there are other ways to execute pragmas before the
import, each fossil invocation is independent so setting it as pragma
cache_size won't work.

HTH

---   ---
Eduardo Morras 
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil performance and optimisation

2017-08-11 Thread Warren Young
On Aug 11, 2017, at 2:55 PM, Damien Sykes-Lindley  
wrote:
> 
> so that sounds can easily be edited and changed without loss of quality they 
> are uncompressed PCM data. I suppose I could convert them to FLAC

I must not have been clear: when you cannot use a text-based file format with 
Fossil, you at least want to use an *uncompressed* binary file format.  So, 
stick with PCM, WAV, AIFF, etc.  Absolutely do not switch to FLAC.

If you move to a game engine that wants a compressed audio format — some use 
Ogg Vorbis, for example — my recommendation is that you still store the data in 
Fossil in uncompressed form, but add some build rule that converts it to the 
compressed form for testing and delivery.

Not only will this make Fossil — or really, *any* VCS — happier, it lets you 
change your compression settings, format, etc. without bloating the repository 
with multiple versions.

I speak from experience.  My largest repository has essentially the same 
graphics for a web app in:

1. 8-bit GIF format from the days before PNG was in all browsers

2. 8-bit PNG format from the days before IE got the ability to cope with alpha 
channels in PNG

3. A different set of 8-bit PNGs when we changed our UI color scheme enough 
that the old 8-bit matting caused color fringing when the old PNGs were 
composited atop the new app background.

4. 24-bit PNG format once we finally dropped support for those old versions of 
IE

If I’d had the foresight to write a script to convert them from the high-res 
PSD or SVG forms to GIFs or PNGs, I’d still only have one version of most of 
these graphics.

> As for the executable, sometimes that gets included due to the fact that we 
> forget to delete it after testing an executable copy and don't use Fossil's 
> ignore feature

You can “shun” those after the fact to strip them out of the repo.  You have to 
get all clones to cooperate for this to work, but it’s doable.

> compiled language like C++ where there are many assets to manage can make 
> compilation a real ballache - maybe that's because I'm so new to such 
> languages.

If your build system is not reliable, I’d say that’s where you should spend 
your efforts, not on trying to get Fossil to cope with checkins containing 
unstripped binaries generated from other files that are efficiently stored in 
the repository.

You probably need some amount of scripting that lets you clone a repository and 
then run the script to get things into a buildable shape, automatically and 
reliably.  Anything less becomes the PITA you’ve run into.

Windows is a bit of an outlier here: on all the other major platforms, we have 
powerful, scriptable build systems, which we had to build in order to deal with 
the complexity of platform differences.  A positive side effect of that is that 
we can script our way out of most build complexities.

I pity the Windows dev that is forced to choose between batch files and 
PowerShell to achieve the same end.

> I had no idea about the checksum setting. Despite three year usage, it seems 
> I've only scratched the surface of Fossil. Again, another reason why I put 
> some of my thoughts out there. Some users, and of course the developers 
> themselves, who are much more knowledgeable on these things than I am.

We now await a repeat of your tests. :)

___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Fossil performance and optimisation

2017-08-11 Thread Damien Sykes-Lindley

Hi Warren,
Very interesting. As I said I don't understand how these things work 
internally, I just thought I'd put it out here and see what the more 
knowledgeable people thought of these comparisons. I just know that it 
records artefact changes, I've no idea how the compression or local/remote 
sync works or what repercussions would occur as a result in either system.


I ought to clarify that the games I work on are based on sound rather than 
graphics, and so that sounds can easily be edited and changed without loss 
of quality they are uncompressed PCM data. I suppose I could convert them to 
FLAC, but given that the game engine doesn't support this format they would 
have to be reconverted back to PCM before every test.
As for the executable, sometimes that gets included due to the fact that we 
forget to delete it after testing an executable copy and don't use Fossil's 
ignore feature, although to be fair with some of my non-game projects I do 
include executables and binary libraries for the simple fact that using a 
mainstream, compiled language like C++ where there are many assets to manage 
can make compilation a real ballache - maybe that's because I'm so new to 
such languages. So if I include the executable I can always get the latest 
without having to spend hours trying to figure out how to recompile the 
thing.
Same with libraries - some libraries that I use are proprietary and so 
including them in source form isn't an option. Especially in an interpreted 
language where I can simply run the code from the source script, if it needs 
the libraries to run I always include them. Again, maybe a mistake on my 
part but I always like to be prepared when I am checking out a new branch or 
version to test or work with.


The Fossil and Git commit tests were both locals with no remotes attached, 
so one didn't have any speed advantages against the other due to network 
usage.


I had no idea about the checksum setting. Despite three year usage, it seems 
I've only scratched the surface of Fossil. Again, another reason why I put 
some of my thoughts out there. Some users, and of course the developers 
themselves, who are much more knowledgeable on these things than I am.

Cheers.
Damien.
-Original Message- 
From: Warren Young

Sent: Friday, August 11, 2017 9:08 PM
To: Fossil SCM user's discussion
Subject: Re: [fossil-users] Fossil performance and optimisation

On Aug 11, 2017, at 7:10 AM, Damien Sykes-Lindley 
<dam...@dcpendleton.plus.com> wrote:


I couldn't help noticing there seemed to be a silence on speed 
comparisons.


There have been many threads on this over the years.  Just for a start, 
search the list archives for “NetBSD”.


After cloning and working with several publicised Fossil repositories, I 
can't help but notice that the majority of them are rather small.


Yes, that’s best practice for most any DVCS.  Even the Linux project takes 
this philosophy:


   http://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html

That is, the single monolithic repo you see when cloning “the Linux Git 
repo” is something of an illusion, which the developers of Linux don’t 
actually deal with very much.


Most of the projects that I am involved with are games...Of course these 
will contain binary files


That “of course” needn’t be a foregone conclusion.

Many asset formats are available in text forms, which are friendly for use 
in version control systems.  For example, you may be able to store 3D models 
in the repository in COLLADA format and some 2D assets in SVG.


For the bitmapped textures, it’s better to store those as uncompressed 
bitmap formats, then compress them during the build process to whatever 
format you’ll use within the game engine and for distribution.


A 1-pixel change to a Windows BMP file causes a much smaller change to the 
size of a Fossil repository than does a 1-pixel change to a JPEG or PNG, 
because that 1 pixel difference can throw off the whole rest of the 
compression algorithm, causing much of the rest of the file to change.


This can be tricky to manage.  You might think TIFF is a good file format 
for this purpose, but you’re forgetting all the metadata in it that changes 
simply when a file is opened and re-saved.  (Timestamps, GUIDs, etc.)  It’s 
better to go with a bare “box of pixels” format like Windows BMP.


All of this does make the checkout size bigger, but Fossil’s delta 
compression has two positive consequences here:


1. The Fossil repository size will probably be as small or even smaller.  A 
zlib-compressed Windows BMP file is going to be about the same size as a PNG 
file with the same content.


2. If those files are changed multiple times between initial creation and 
product ship time, the delta compression will do a far better job if the 
input data isn’t already compressed.  This is how you get the high 
compression ratios you see on most Fossil repositories by visiting their 
/stat page.  My biggest repo

Re: [fossil-users] Fossil performance and optimisation

2017-08-11 Thread Warren Young
On Aug 11, 2017, at 7:10 AM, Damien Sykes-Lindley  
wrote:
> 
> I couldn't help noticing there seemed to be a silence on speed comparisons.

There have been many threads on this over the years.  Just for a start, search 
the list archives for “NetBSD”. 

> After cloning and working with several publicised Fossil repositories, I 
> can't help but notice that the majority of them are rather small.

Yes, that’s best practice for most any DVCS.  Even the Linux project takes this 
philosophy:

http://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html

That is, the single monolithic repo you see when cloning “the Linux Git repo” 
is something of an illusion, which the developers of Linux don’t actually deal 
with very much.

> Most of the projects that I am involved with are games...Of course these will 
> contain binary files

That “of course” needn’t be a foregone conclusion.  

Many asset formats are available in text forms, which are friendly for use in 
version control systems.  For example, you may be able to store 3D models in 
the repository in COLLADA format and some 2D assets in SVG.

For the bitmapped textures, it’s better to store those as uncompressed bitmap 
formats, then compress them during the build process to whatever format you’ll 
use within the game engine and for distribution.

A 1-pixel change to a Windows BMP file causes a much smaller change to the size 
of a Fossil repository than does a 1-pixel change to a JPEG or PNG, because 
that 1 pixel difference can throw off the whole rest of the compression 
algorithm, causing much of the rest of the file to change.

This can be tricky to manage.  You might think TIFF is a good file format for 
this purpose, but you’re forgetting all the metadata in it that changes simply 
when a file is opened and re-saved.  (Timestamps, GUIDs, etc.)  It’s better to 
go with a bare “box of pixels” format like Windows BMP.

All of this does make the checkout size bigger, but Fossil’s delta compression 
has two positive consequences here:

1. The Fossil repository size will probably be as small or even smaller.  A 
zlib-compressed Windows BMP file is going to be about the same size as a PNG 
file with the same content.

2. If those files are changed multiple times between initial creation and 
product ship time, the delta compression will do a far better job if the input 
data isn’t already compressed.  This is how you get the high compression ratios 
you see on most Fossil repositories by visiting their /stat page.  My biggest 
repository is rocking along at 39:1 compression ratio, and it hasn’t been 
rebuilt and recompressed lately.

> (generally an executable

Why would you include generated files in a version control repository?

Fossil is not a networked file system.  If you try to treat it like one, it 
will take its revenge on you.

> dependency libraries

In source code form only, perhaps.

Even then, it’s better to hold those in separate repositories.

It would be nice if Fossil had a sub-modules feature like Git to help with 
this, so that opening the main repository also caused sub-Fossils to be cloned 
and opened in subdirectories.  Meanwhile, you have to do manual “fossil open 
--nested” commands, but it’s a one-time hassle.

Nested checkins would also be nice.  That is, if a file changes in a nested 
checkout, a “fossil ci” from the top level should offer to check in the changes 
on the sub-project.

> Also note that all commits were tests only and so weren't synced to remotes. 
> Naturally this means that commits are even slower when syncing.

It also means that local differences are a smaller percentage of the total time 
taken for many operations, since the time may be swamped by network I/O.

For instance, I notice in your tests that you seem to be comparing “fossil ci” 
to “git commit”, where the fair test would be against “git commit -a && git 
push”.

> 1. Git seems to do better at compressing and opening smaller repositories, 
> while Fossil triumphs over larger ones.

Be careful with such comparisons.

Fossil repositories aren’t kept optimally small, since that would increase the 
time for checkins and such.  Every now and then, even after an initial import, 
you want to look into “fossil rebuild” and some of its more advanced options.

This is what I was getting at about with my comments about the 39:1 compression 
ratio I’m currently seeing on my largest Fossil repository.  I expect I could 
make it smaller, if I did such a rebuild.

I have no idea if Git has some similar “rebuild” feature, though I will 
speculate that the per-file filesystem overheads will eat away at a lot of any 
advantages Git has.  Be sure you’re calculating size-on-disk, not the total 
size of the files alone.  That is, a 1 byte file on a filesystem with a 4K 
block size takes 4K plus a directory entry, not 1 byte.

Fossil, by keeping all artifacts in a single file, does not have this overhead. 
 The “rebuild” problem is