On Thursday 25 June 2009 22:53:30 Bob Clancy wrote: > As I'm reading the beginning of the OReilly Git book [...]
This reminds me that I wanted to thank Andy Oram for putting the draft of the Git book for download back in April. While it had a fair amount of typos, unfinished graphics, and inconsistencies, it was nevertheless useful. Having read that preview, by the way, only made me want to own the final version, which I added to my Safari Bookshelf as soon as it was out, and refer to often. The only thing I regret is not being quick enough to buy the full PDF that was offered for sale a few weeks before the release. I found that having the draft PDF was useful when I didn't have a network connection, and it's also a faster and more comfortable interface than the Safari website (which is not to say the latter is bad). My subscription level (10-slot Basic bookshelf) doesn't have access to PDF downloads, and I can't find the full PDF for sale either in the O'Reilly website, darn it! > "The Venti Filesystem," (Plan 9), Bell Labs, > http://www.usenix.org/events/fast02/quinlan/quinlan_html/index.html > > Now I see how Git can track blocks of code that are moved - very nice! Reading that paper reminds me of a very similar application of cryptographic fingerprints/digests/hashes/checksums: hardlink-based hard drive backup software. At home I use storeBackup (http://storebackup.org/), which is written in Perl, for daily backups. At work I used BackupPC (http://backuppc.sourceforge.net/), also written in Perl. It has a more sophisticated interface, including self-service restores, which was a big bonus for me, since users didn't have to pound on my door when they deleted something by accident. Both systems pool identical files, based on their MD5 fingerprint, by hardlinking them, and use compression on top of that, thus saving a lot of storage while doing away with the mental rigamarole that is the mixture of full/incremental/differential backups. From the point of view of the user they are bona-fide full backups, and that's it. It looks like the latest versions of storeBackup also chunks large files, a la Venti. As an example, the /home director in my home server is ~650 GB (it contains ~440 GB of music files, mostly FLAC). I have 48 full backups of it going back to 2005 at increasing intervals, and they take up ~560 GB. Not bad. One difference with git (I know this is a gross approximation; just squint hard enough) is that git explicitly tracks and persists file duplication information, while storeBackup and BackupPC let the filesystem take care of that (although both keep lists of fingerprints so that they don't need to be computed every backup run for every past backup). So now I'm wondering whether git could be used as a backup system. Since it does features coalescing of duplicate data and it also uses compression (zlib, I think), the storage efficiency should very similar. The question is what would the speed be like. My dailies take around one hour in normal circumstances (not a lot of large changes in the home dir). Sounds like an experiment in the works... Bernardo _______________________________________________ Boston-pm mailing list [email protected] http://mail.pm.org/mailman/listinfo/boston-pm

