On Mon, May 18, 2009 at 9:38 PM, Karl Vogel <vogelke+u...@pobox.com<vogelke%2bu...@pobox.com> > wrote:
> >> On Sun, 17 May 2009 09:12:57 -0700, > >> Kelly Jones <kelly.terry.jo...@gmail.com> said: > > K> I like this plan because it does versioned backups, and doesn't backup > K> identical files twice. I dislike it because I lose Mozy's unlimited disk > K> space. > > K> % Is there software that already does this? > > I have a 3-Tbyte server running FreeBSD-6.1 that does something very > similar. I don't bother with encrypting the filenames or hashes > because we control the box, and if I'm not at work, other admins > might need to restore something quickly. > > We have around 3.7 million files from 5 other servers backed up > under two 1.5-Tbyte filesystems, /mir01 and /mir02. My setup looks > like this: > > +-----mir01 > | +-----HASH > | | +-----00 > | | | +-----00 > | | | +-----01 > ... > | | +-----01 > ... > | | +-----fe > | | +-----ff > | +-----server1 > | +-----server2 > +-----mir02 > | +-----HASH > | +-----server3 > | +-----server4 > | +-----server5 > > The HASH directories have two levels of subdirectories 00-ff. > That's been more than sufficient to keep directories from getting > too big; I average around 25 files per directory. > > I do hourly backups on the other fileservers using something like the > find and timestamp method you mentioned, but I ignore 0-length files > because they always hash to the same value. The backup directories > for the second fileserver look like this for 5 May 2009: > > +-----mir01 > | +-----server2 > | | +-----2009 > | | | +-----0505 > | | | | +-----070700 > | | | | | +-----doc (filesystem) > | | | | | +-----home > | | | | +-----080700 > | | | | | +-----doc > | | | | | +-----home > ... > | | | | +-----190700 > | | | | | +-----home > > After the backups are rsynced to the backup server, I find any regular > files with only one link, compute the RMD160 hash of the contents, and > make a hardlink to the appropriate filename under the HASH directory. > People love to make copies of copies of files, so this really cuts down > on the disk space used. > > The hardlinks make it easy to avoid restoring things that aren't what > the user had in mind; if a file's been corrupted, I can tell when it > happened just by looking at the inode, so I don't restore an earlier > version that's also junk. I can also tell if there were duplicates > anywhere on the fileserver at the time the user lost the good version; > it's a lot faster for them to get a known good copy from somewhere > else on the fileserver than it is to restore over the network. > > The software is just a few scripts to do things like find files with > just one link, compute hashes, do hardlinks, etc. I can put up a tarball > if anyone's interested. > Hello Kelly, I am doing something similar at a company i work for. I would be interested to see your scripts to make a comparison. thanks, v > > -- > Karl Vogel I don't speak for the USAF or my company > > The best way for the Government to maintain its credit is to pay as it > goes-not by resorting to loans, but by keeping out of debt-through an > adequate income secured by a system of taxation, external or internal, > or both. --Pres. William McKinley's First Inaugural Address, March 4, 1897 > _______________________________________________ > email@example.com mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to " > freebsd-questions-unsubscr...@freebsd.org" > -- network warrior since 2005 _______________________________________________ firstname.lastname@example.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"