> From: John Fisher <fishook2...@gmail.com>
> FYI we are archiving compressed Linux disk images for VMs and
A core problem is that you've got the worst sort of data for something
like Git. Your files are huge, and being compressed, any effort to
compress saved files or find duplicate strings between them is totally
wasted. Your workload is anti-optimized for any "source management
Here's something that might work (ugh): Use Subversion, which I seem
to recall will do "delta encoding" between versions of a single file
but not *between* files. Have a directory (or directories) which
contain all the big files. Whenever you change a big file, delete the
old version and create the new version *under a different name* (so
Subversion doesn't try to delta-encode the new version relative to the
old one). Now, for your "real" files, keep a directory tree like
normal, but for each of the big files, use a symbolic link (under the
desired name) that points to the actual file (off in the storage
directory). (Not "svn mv", but just a filesystem move, so that
Subversion doesn't try to connect different versions of a binary.) I
*think* that will prevent Subversion from trying to do anything clever
with big, low-redundancy binary files.
You could probably write a script that would go through the structure
and groom it into the proper shape to be committed: Move any big
files in the "real" tree into the storage directory, replacing them
with links, deleting any non-linked-to files in the storage directory,
etc. The trick would be having a way to generate the name in the
storage directory in a way that is uniquely determined by the file
contents (and possibly modification date). You don't want to hash the
whole file, that would be too slow...
You received this message because you are subscribed to the Google Groups "Git
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email
For more options, visit https://groups.google.com/d/optout.