Asheesh wrote: > A problem with Keith's suggestion is that if any user at all is > running rsync, then the dirvish cron job will fail to start.
On Wed, Dec 10, 2008 at 10:34:29AM -0600, Richard wrote: > Keith, try this line on for size. (You'll have to substitute your vault > path or tree: path) > > if ps ax | grep r\\sync.*$VAULTPATH > /dev/null ; then > > > OK...well I'm off to do other things. I *STILL* haven't found WHY that > grep statement works! Keith responds: I appreciate the contributions - for many if not most situations they may work better, but not in my own case. Put them on the wiki, too! Dirvish is designed to be customized with four scripts: pre-server, pre-client, post-client, and post server, which run before and after the individual rsync run. This moves complexity out of dirvish, which is a Good Thing. The "preclient" script runs on the client, and knows about rsync jobs running only on the client. A failed preclient script only stops the one rsync job that is associated with it - the rest of the dirvish spawned rsync jobs are unaffected. I schedule dirvish to run at around 2AM. There is normally no reason to run any other rsync jobs at that time - if there is a human running rsync at that time, they probably don't want to be slowed down by dirvish. If there is a bandwidth limit to a particular client, there is no advantage to running multiple rsync commands at once - rsync is designed to push data in parallel anyway, and optimizes for available bandwidth. Two or more rsync jobs just slow each other down, thrash disk and memory, and make completion time for both more uncertain. Better to schedule them back-to-back, sequentially. I do a lot with the following sequence: 1) Front wrapper script before dirvish runs. I use this to mount disks and prepare failure counters and such. About 100 lines of bash. 2) Pre-server. I don't do much with this beyond log variables, but it is a good place to check for disk space and wait for other jobs to complete on the server. A ping with an upper time limit might be good here, if the pipe to the client is busy for other reasons. 3) Pre-client. I log variables from the client, and now check for running rsync jobs. This is a good place to set up stuff on the client, perhaps mount drives, set up security, and lock out other processes until dirvish is done. I've also considered mounting the client's VMware guests, and backing up their virtual drives via file sharing. 4) Dirvish/Rsync. The basic operation, as simple and reliable as possible. Since some backups are running through end-to-end VPN tunnels, I have considered turning off rsync's ssh encryption on them. 5) Post-client. Reverse pre-client setups, and run df and fdisk on the client to aid reconstruction of client disks. 6) Post-server. A good place to parse results and increment failure counters. I also make symlinks to the tree of each branch when they successfully complete. 7) Back wrapper script. This is where I count failures, look at disk usage, and summarize the results of all the runs. I also unmount backup drives and turn off buses and controllers, so the drives are isolated and can be swapped into the fireproof safe or moved offsite. I am still using PATA backup drives, but they are connected to the CPU through PATA/SATA adapters, because SATA hotswap is well supported in 2.6.XX kernels. Another 100 or so lines of bash. So my tendency is to leave dirvish alone, and do the complex and situation-specific stuff with relatively simple pre- and post- scripts. I should post the scripts I use to the wiki. Real Soon Now. Keith -- Keith Lofstrom [EMAIL PROTECTED] Voice (503)-520-1993 KLIC --- Keith Lofstrom Integrated Circuits --- "Your Ideas in Silicon" Design Contracting in Bipolar and CMOS - Analog, Digital, and Scan ICs _______________________________________________ Dirvish mailing list [email protected] http://www.dirvish.org/mailman/listinfo/dirvish
