>> On Fri, 11 Jan 2008 07:30:42 -0500, James R Owen <[EMAIL PROTECTED]> said:
> Yale University will convert from UWash to Cyrus IMAP email service in > Q2-2008. > Searching for Cyrus ref's, I know that UFla, Cornell, Buffalo, > MPI-FKF, and Uni-Ulm responded to BostonU's 2006-05-16 request for > Cyrus backup/restore advice. I hope you and others w/large Cyrus > experience will respond again! That's us. :) > We plan to have 10 Cyrus email backend servers (each w/10*200GB > FileStores) clustered in two primary datacenters, here and there. > The five Cyrus backends here will backup to a dedicated TSM service > there, and vice versa. Current testing indicates that a DR restore > of a single 200GB FS from TSM continuous incremental backups on LTO3 > tapes will probably take longer than a week to complete! Obviously, > we need a better backup & restore plan. Cyrus stores mail with one file per message: this means that database behavior will be unusually prominent in all of your operations. Keep that in mind. We also have 10 back-ends, each of which houses 10 ~60G stores. So while smaller we're in the same order of magnitude, close enough to see the same architectural issues, I trust. I am currently backing up each back-end to a separate TSM instance. Experience has yielded the opinion that this is an excess of caution, but the previous configuration was 4 back ends, and putting two of -those- on the same TSM server was not pleasant. I do not collocate by filespace. The number of volumes per node (and thus in my scheme, per TSM instance) is sufficiencly low that it's not an issue. We do nightly incrementals, which finish in a few hours per back-end. We stretch them out over much of the evening to save peak load on the DB spindles. I kicked off a full restore of one of my file stores in response to this message. It was 53G total, and finished in 1:36. This mounted four tape drives to begin with. 11:48 - 12:28 48 min 11:48 - 12:49 61 min 11:48 - 12:54 66 min 11:48 - 13:24 96 min So, that's a total of 271 drive-minutes, or 4.5 drive-hours. One of my compatriots, who was watching the back-ends, says that we were blowing out the IOOPS on the lun, and also blowing out the write cache on the disk subsystem. This corroberates with my observations of occasional multi-second sendW and recvW on the restore sessions; in this config my bottleneck was SAN recieve transactions. (not recieve -bandwidth-, note. transactions.) So, if I had your filestore sizes, I'd probably be restoring one of them in 5-6 hours. I anticipate there are procedural or equipment wins somewhere in your scenario. Where do you think your bottleneck is? - Allen S. Rout - whistles "LTO's not great at seeks, doo-dah, doo-dah"
