On 11/27/15, Warren Young <[email protected]> wrote: > So, what did that fix actually do? What was taking up the extra space, > which VACUUM did not find?
The command does extra delta-compression. Fossil does not store the complete text of every version of every file. Instead, it tries to store the complete text of just the most recent check-in, and then older versions of the file are stored as (small) deltas of the next newer version. A "delta" is a short specification on how to transform one file into another. (See https://www.fossil-scm.org/fossil/doc/df302de88a4c54be7727/www/delta_format.wiki for additional information.) Actually, the default behavior is to store the complete text of the tip of every branch in the directed acyclic graph (DAG) that describes the complete history of each file. Then when you run "fossil rebuild --compress" (or --compress-only) it goes back and does additional delta encoding for the tips of all branches other than trunk. So if you have a project with a lot of branches, running --compress might do a lot of additional delta compression, and hence save a lot of space. Research problem: Figure out a good algorithm to get Fossil to apply more aggressive delta encoding on the tips of branches so that --compress becomes superfluous. Another research problem: Current, Fossil only delta-compresses versions of the same file. Enhance --compress so that it detects different files that happen to share a lot of content and delta encode them against one another. Additional background information: Most other version control systems, and especially older version control systems (RCS, CVS) have a very specific way in which they apply deltas. Usually, the tip is stored full-text and then all earlier versions are deltas of their children. That method works well, and it is the default method used by Fossil. But Fossil is much more flexible. Fossil is able to store any version of any files as a delta of any other version of any other file (as long as there are no delta-loops). So Fossil could, if you want, store the complete text of the first version of every file and then store subsequent revisions as deltas - the inverse of the default encoding. Or it could limit the length of delta chains to some constant - say 20. (I think Hg does this doesn't it?) There are lots of possibilities here, and the Fossil file format is able to support them all. In particular, you could take an existing Fossil repo, redo all the deltas using some completely new mapping, and the repo would still continue to work fine. This flexibility allows for a lot of experimentation with the delta encoding without breaking legacy. -- D. Richard Hipp [email protected] _______________________________________________ fossil-users mailing list [email protected] http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

