John, A few things come to mind; Which nodes are pinning the recovery log? In my experience it are always a few slow nodes (with a lot of small files typically) that pin the log. Try to find out which one do, and try to improve these nodes so that they backup faster. Hell of a job when you have 500 nodes, but try to find those that take longer than 4-5 hours or have a really slow throughput speed. A speed/duplex mismatch on a TSM client can killed my log performance more than once.You can look in TSM reporting for the slowest nodes.
IMHO, I think that TSM 6.1.x will not solve your problem. Another solution would be to turn of the cell phone off every other day ;-) Good luck, 2010/2/14 Dury, John C. <[email protected]> > We have about 500 nodes and have a backup windows from 5pm until 7am. I > have our backup schedule setup so that about 30 nodes do incremental per > hour with a few exceptions. We have a 3T disk storage pool and 4 LTO4 drives > in our tape library. Our dbbackuptrigger is set at logfull 30% and > numincrmeentals of 4. Our recovery log is filling up almost once per hour > while backups are running and not emptying fast enough before it hits 80% > when all backups come to a crawl until it is emptied below 80%. Sometimes > the recovery log is pinned at 70% or so and another backup kicks off > immediately which again does not empty fast enough and the whole system goes > into slowdown after the recovery log is past 80%. Expiration, which used to > run in a matter of about 6 hours, is not completing even after running for > 24 hours. Our DB is about 97gig and about 74% full. The recovery log is > maxed at 13gig. I don't see anything in the activity log out of the > ordinary. The TSM server is AIX 5.3.10.1 TL10 running on an IBM 9131-52A in > a logical partition with 20 CPus configured and about 32G of RAM. The TSM DB > and disk storage pools are attached to a Clariion CX3-80 via 4G Hbas. I have > the recovery log and TSM DB set to use different HBAs then the disk or tape > storage pools so the HBAs aren't fighting each other. I've read the tuning > and performance manual and matched our settings to match it's suggestions > with some small exceptions. > > We have purchased new hardware to move the whole system to Linux and a > monster of a box since we want to get to TSM v6.x eventually, hopefully > sooner rather than later. AIX hardware and support is tremendously expensive > when compared to an intel based box and like a lot of people, we have a very > small budget for anything IT related. > . > One of the biggest problems we are having is the recovery log filling up > too quickly and not emptying fast enough. Even with a log full trigger of > 30%, the incremental backup won't finish before the recovery log hits 80% > and with the log full setting so low, we are doing TSM DB backups almost > every hour while clients are backing up. This really seems excessive to me. > Why would an incremental backup of the TSM DB take an hour or so to run and > is it normal for the recovery log to fill up so fast while backups are > running? > We even attempted to do a reorg of the TSM DB but unfortunately it was > going to run for much longer than our window allowed so it had to be > cancelled. I'm going to try again for next weekend and hopefully talk the > powers that be, into a 24 hour window for the reorg. We did do a reorg years > ago and the performance improvements were amazing, ie expiration ran in less > than an hour. I know that is a bandaid but I have to do something until I > can get to version 6 when I can have a bigger recovery log and a new, more > powerful server in place. > I guess I'm just not sure what to look at at this point and frankly I'm > exhausted. Our help desk is calling me daily, every day, at 6am or earlier, > as "TSM is running slow again". > Any suggestions on what else to look at? (Sorry for such a fragmented > email. I've had about 3 hours sleep at this point) > -- Kind Regards, Groetje, Marcel Anthonijsz
