On Fri, 16 May 2014 10:43:20 -0400
wor...@alum.mit.edu (Dale R. Worley) wrote:

Sorry to replying to your message, not OP's.

> > FYI we are archiving compressed Linux disk images for VMs and
> > hypervisors.
> 
> A core problem is that you've got the worst sort of data for something
> like Git.  Your files are huge, and being compressed, any effort to
> compress saved files or find duplicate strings between them is totally
> wasted.  Your workload is anti-optimized for any "source management
> system".
> 
> Here's something that might work (ugh):  Use Subversion, which I seem
> to recall will do "delta encoding" between versions of a single file
> but not *between* files.
[...]

Mercurial does this as well.  On the other hand, IIRC, after N
revisions it does something like "full checkpoint" to make
reconstructing past revisions faster.

I think the OP is better off using something like rsnapshot [1] or
rdiff-backup [2] for his task, or `rsync -H --no-inc-recursive` +
`cp -alR` and bit of shell scripting.  These tools provide file-level
(in fact, inode-level) deduplication by hardlinking unchanged files.
Dirvish and unison come to mind as well (I'm lazy to google the links
to their sites, sorry).

Another approach is to use a backup tool which performs block-level
deduplication.  For this, I can name obnam [3] and ZFS (snapshotting
with block-level dedup turned on).

Also not sure if this has been mentioned by other folks but there
exist bup [4] and boar [5] which build on paradigms of VCS but are
tailored to the needs of working with big binary files.  This [6] is
particularly insightful.

1. http://www.rsnapshot.org/
2. http://www.nongnu.org/rdiff-backup/
3. http://obnam.org/
4. https://github.com/bup/bup
5. https://code.google.com/p/boar
6. https://github.com/bup/bup/blob/master/DESIGN

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to