It appears that the compression ratio shown with the ‘fossil db –db-check’ 
command is based on the actual total file size of the repo against the would-be 
size of all expanded versions stored separately (based on description here: 
https://www.fossil-scm.org/xfer/doc/trunk/www/stats.wiki).

There are two cases, however, that IMO the command gives a false impression of 
the actual compression achieved in the repo.

* The first is the inclusion of un-versioned files which although inflate the 
total file size have no play in the versioning part, which is what I believe 
the compression ratio was meant to highlight.

* The second is the presence of free pages not yet vacuumed.  This is unused 
space that IMO ‘unfairly’ lowers the ratio.

Taken from the wiki page earlier “... hence the SQLite project gets excellent 
73:1 compression”

If we were to add several big un-versioned files (such as an assortment of 
pre-built binaries for various configurations and platforms) the repo size will 
obviously increase dropping ‘unfairly’ (IMO) the ‘excellent’ ratio of 
compression, when in fact it hasn’t changed at all with respect to the 
versioned history.

So, in practice, the compression ratio is not meaningful in a useful way when 
the repo includes either big un-versioned files or too many free pages, and 
could be improved to ignore those two cases from the computation.

Your thoughts?
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to