On Thursday 25 June 2009 22:53:30 Bob Clancy wrote:
> As I'm reading the beginning of the OReilly Git book [...]

This reminds me that I wanted to thank Andy Oram for putting the draft of the 
Git book for download back in April. While it had a fair amount of typos, 
unfinished graphics, and inconsistencies, it was nevertheless useful. Having 
read that preview, by the way, only made me want to own the final version, 
which I added to my Safari Bookshelf as soon as it was out, and refer to 
often.

The only thing I regret is not being quick enough to buy the full PDF that was 
offered for sale a few weeks before the release. I found that having the draft 
PDF was useful when I didn't have a network connection, and it's also a faster 
and more comfortable interface than the Safari website (which is not to say 
the latter is bad). My subscription level (10-slot Basic bookshelf) doesn't 
have access to PDF downloads, and I can't find the full PDF for sale either in 
the O'Reilly website, darn it!

> "The Venti Filesystem," (Plan 9), Bell Labs,
> http://www.usenix.org/events/fast02/quinlan/quinlan_html/index.html
>
> Now I see how Git can track blocks of code that are moved - very nice!

Reading that paper reminds me of a very similar application of cryptographic 
fingerprints/digests/hashes/checksums: hardlink-based hard drive backup 
software. At home I use storeBackup (http://storebackup.org/), which is 
written in Perl, for daily backups.

At work I used BackupPC (http://backuppc.sourceforge.net/), also written in 
Perl. It has a more sophisticated interface, including self-service restores, 
which was a big bonus for me, since users didn't have to pound on my door when 
they deleted something by accident.

Both systems pool identical files, based on their MD5 fingerprint, by 
hardlinking them, and use compression on top of that, thus saving a lot of 
storage while doing away with the mental rigamarole that is the mixture of 
full/incremental/differential backups. From the point of view of the user they 
are bona-fide full backups, and that's it. It looks like the latest versions 
of storeBackup also chunks large files, a la Venti.

As an example, the /home director in my home server is ~650 GB (it contains 
~440 GB of music files, mostly FLAC). I have 48 full backups of it going back 
to 2005 at increasing intervals, and they take up ~560 GB. Not bad.

One difference with git (I know this is a gross approximation; just squint 
hard enough) is that git explicitly tracks and persists file duplication 
information, while storeBackup and BackupPC let the filesystem take care of 
that (although both keep lists of fingerprints so that they don't need to be 
computed every backup run for every past backup).

So now I'm wondering whether git could be used as a backup system. Since it 
does features coalescing of duplicate data and it also uses compression (zlib, 
I think), the storage efficiency should very similar. The question is what 
would the speed be like. My dailies take around one hour in normal 
circumstances (not a lot of large changes in the home dir). Sounds like an 
experiment in the works... 


Bernardo

_______________________________________________
Boston-pm mailing list
[email protected]
http://mail.pm.org/mailman/listinfo/boston-pm

Reply via email to