Hi Richard , tsm-server: V5.3.3.0 on solaris v440/16GB ram, 3494lib, 3592drives tsm-client: V5.3.3.0 on solaris v440/16GB ram
Our cyrus mail-server is set up on solaris and running the whole mail-data with a copy using a private-fc to another building as a permanently synced one -- it's a poor-mans solution just using the available system services -- but running okay. You may also consider to clone the mailserver-machine too so in case of a catastrophic scenario the mail-service can swap completely to another location. Our mail-data itself is always in a synchronized state - others are doing delays to have a kind of restore-window from that copy. Because of this HA-config we don't hope to ever restore the full mail-data from TSM :-) With TSM we backup as normal incremental (no snapshot ... ) and doing the 'normal' restores of user-folders accasionally deleted by users. In the last time I have done a lot of tsm-restore-tests of our cyrus mail-server ( currently 1 Filesystem, 5 Mio Files, 280GB ) and had those expereinces: A complete backup of the whole ( with just one session ) runs in about 18 hours, but normally not doing that - just incremental. The incremental with around 80.000files/10GB take about 4 hours per night. Currently our best restore-time of that single mail-server filesystem has taken 03:49:51 for 4,4mio objects/280GB - thats pretty good for us and I just try to get this down to about 3 hours balancing the data on more input-volummes/disk-cache. That best-restore time was a result of a 'fresh' full backup finally placed on mainly 2 3592 tapes and only very few on disk-cache. These values ( average objects/hour - average data/hour ) are finally the facts showing what at least had been possible once. Doing a full restore from the normal backup data (not the 'fresh' one) with all the real wholes and with the aggregate-wholes within and so on ... take about +70-80 % of time compared with the best-one real-possible. In case of full desaster - with HA -solution also not working we finally would do that full restore and while the restore is running no sending of mail would be possible - incoming mails would be queued. So having that pause the cyrus-reconstruct of all folders is not necessary which itself may take a very long time. I think everyone has do one full-restore-test of the mail server at any time using tsm-snapshot or the 'normal' tsm backup data from incrementals -whatever using- just to proof whats going on. The other thing I now came to is to get the value of the best full-restore-througput that is possible -in practice- just to verify the overall-status and identify possible bottlenecks. ........................................ Currently I have two problems with the cyrus-backup 1) Full restore : comparing with our individual best-possible full-mail-restore-time ... the +80% are not bad but it seems that tsm slows down in some way the restore. I have really always measuring a fast start of the restore in the first 2-3 hours ( measuring restored-Files and restored-data ) and the restore-forecast always looks like having a total restore-elapsed time of about +30% (comparing to the best-possible). In the end the restore slows down without any obvious reasons an the restore-process on the client is raising with his cpu-usage and it take no wonder that the tsm-server is showing more and more 'SendWait' states of the sessions. For me the bottleneck seems to be 'inside' the tsm-software and currently i have an open pmr on that. 2) Partial restore: Restoring just a few hundred Files/few MB may result in a too-long-time ... tsm is doing things not understandable... maybe its an deep problem / architectura. Hhere we help ourself disabling the nqr-restore using just for example dsmc restore "/mail/imap/j/user/juser/?*" ... running pretty fast ... instead of dsmc restore /mail/imap/j/user/juser/ ... may take 10 times longer ... I hope IBM is aware of that problem - because its a really painfull and annoying one ........................................ ...just some thoughts Rainer
