If you recall, not too long ago we had a disasterous disk failure which caused a 15-day restore of millions of files and hundreds of gigs of storage..
As is the case in many situations of this kind, this disaster has brought the TSM BACKUP/RESTORE process into the spotlight. It has also enabled the "loosening of purse-strings" so that we would address this kind of situation in a more timely manner, should it happen again. This has also caused us to think about other huge systems that have terabytes of storage (and new ones coming online, shortly). So, the only reasonable solution we came up with is to perform IMAGE backups on a regular basis (bi-weekly ? monthly ?) and regular incremental backups to supplement the image backups. Therefore, if we have another situation like this, we would restore from the image backup first and then pickup the missing pieces from the incremental backups. This should greatly shorten the time to restore 7+million files of over 800GB (and still growing). However, rather than try to put all this load on one TSM AIX server (which is also servicing 20+ other systems and growing), we have purchased another, beefier AIX system. Our thought on this are as follows: 1. Use the new TSM server exclusively for the IMAGE backups. The new server would have an LTO-2/200GB drive/cartridges. We wouldn't need a very big DB since there would only be IMAGES backed-up to it. No need for a big disk landing-zone since the IMAGES would go straight to tape. 2. Still perform the daily incrementals to the current TSM AIX server. In a DR situation, we would restore from the last image dump and then the incremental backups that have occured after the last image backup was taken. The other option would be to completely move this big systems to the new TSM server and perform both image and incrementals to it. However, that would then require additional processing/management (i.e. reclaims, expires, etc). What are your thoughts about these scenarios ? Anything wrong with the process of keeping IMAGE backups on a different TSM server ? How do you folks with large systems (i.e. 1-terabyte+ servers) handle these situations ?
