I'd use the dd/gzip option, though you may want to write it to another system, and have that system do the compression.
If you're going that route, you might want to run fsck on the dd'd image before compression, to make sure any errors are fixed. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Frederik Ferner Sent: Thursday, September 02, 2010 9:42 AM To: [email protected] Subject: [Lustre-discuss] MDT backup (using tar) taking very long Hi List, we are currently reviewing our backup policy for our Lustre file system as backups of the MDT are taking longer and longer. So far we are creating a LVM snapshot of our MDT, mount this via ldiskfs, run getfattr and getfacl followed by tar (RHEL5 version), basically following the instructions from the manual. The tar options include --sparse and --numeric-owner. At the moment I've got a backup running where the tar process started on Tuesday, so it has now been running more than 24h. Including the getfattr and the getfacl calls (running in parallel) the whole backup has so far been running for more than 48h to backup the MDT for a 700GB MDT for a 214TB Lustre file system. The tar file created so far is about 2GB compressed with gzip. Tar is currently using anything between 30% and 100% cpu according to top, gzip is below 1% cpu usage, overall the MDS is fairly idle, load is about 1.2 on a 8 core machine, top reports this for the cpus. <snip> Cpu(s): 4.2%us, 4.5%sy, 0.0%ni, 85.8%id, 5.2%wa, 0.0%hi, 0.2%si, 0.0%st </snip> vmstat is not showing any I/O worth mentioning, a few (10-1000) blocks per second. Some file system details for the Lustre file system below. The MDS is running lustre 1.6.7.2.ddn3.5 plus a patch for bz #22820 on RHEL5. [bnh65...@cs04r-sc-com01-18 ~]$ lfs df -h UUID bytes Used Available Use% Mounted on lustre01-MDT0000_UUID 699.9G 22.1G 677.8G 3% /mnt/lustre01[MDT:0] [snip] filesystem summary: 214.9T 146.6T 68.3T 68% /mnt/lustre01 [bnh65...@cs04r-sc-com01-18 ~]$ lfs df -ih UUID Inodes IUsed IFree IUse% Mounted on lustre01-MDT0000_UUID 200.0M 71.0M 129.0M 35% /mnt/lustre01[MDT:0] [snip] filesystem summary: 200.0M 71.0M 129.0M 35% /mnt/lustre01 Is this comparable to the backup times other people experience using tar? Could this be because tar has to read the whole file (all zeros) in before deciding that this is a sparse file? For comparison a backup using dd and gzip did 'only' takes about 8h and gzip was using 100% of one cpu core for all of that time, so using a faster compression algorithm this seems a much better option. Are there any dangerous downsides to this approach that I have missed? Kind regards, Frederik -- Frederik Ferner Computer Systems Administrator phone: +44 1235 77 8624 Diamond Light Source Ltd. mob: +44 7917 08 5110 (Apologies in advance for the lines below. Some bits are a legal requirement and I have no control over them.) _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
