Andy, Thanks so much for the complete update on this. While I have not seen any of these problems I very much appreciate the fact that you take the time to explain what is going on to us.
Kelly J. Lipp Storage Solutions Specialists, Inc. PO Box 51313 Colorado Springs, CO 80949 [EMAIL PROTECTED] or [EMAIL PROTECTED] www.storsol.com or www.storserver.com (719)531-5926 Fax: (240)539-7175 -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED]]On Behalf Of Andrew Raibeck Sent: Monday, October 29, 2001 10:34 AM To: [EMAIL PROTECTED] Subject: TSM Server v4.2.1 Hello all, We have put our full attention on addressing the early problems seen in 4.2.1, and either have a fix already available or expect to deliver one shortly. Despite following our strict development process, focusing significant resource on design reviews, testing, and running a successful beta with several large customers, we had some defect escapes to the field that you may have encountered. Our plan is to deliver a fixtest called 4.2.1.6 shortly with all of the fixes currently available on various platforms rolled up into one level. We have corrected problems relating to mount point management, LTO devices, 3494 libraries, and a server crash. Applicable APARs are IC31961, IC31823, IC31831, IC31691, and IC31884. While some fixes are available in various prior patch levels, the rollup fix will address all of these problems. If you have problems in these areas, but your problems are not covered by the description in these APARs, please contact Tivoli service. We have the fixes running in our test environments and at some customer accounts. We have confidence the fixes will correct these reported problems. Ultimately we believe the changes made to these code paths that were introduced into 4.2.1 will be of great benefit and will improve your satisfaction with our product. We apologize for letting these defects escape our process. For your convenience, the text of the APARs listed above appears below my signature information. Regards, Andy Andy Raibeck IBM Software Group Tivoli Storage Manager Client Development Internal Notes e-mail: Andrew Raibeck/Tucson/IBM@IBMUS Internet e-mail: [EMAIL PROTECTED] The only dumb question is the one that goes unasked. The command line is your friend. "Good enough" is the enemy of excellence. ------------ APAR IC31691 ------------ ABSTRACT: LIBRARY AUDIT ON 349X MAY NOT CORRECT CATEGORY MISMATCHES ERROR DESCRIPTION: The LIBRARY AUDIT command for the 3494 and 3495 libraries may not be able to correct category mismatches of private volumes. LOCAL FIX: Recovery Procedure: 1)Stop activity to your library (i.e. mounting/dismounting of tapes) if any activity is going on. 2)Determine the name of the libr from aix perspective. (Usually default is lmcp0). In the procedure that follows, replace lmcp0 with whatever name is appropriate for your system. (lsdev -Cc tape should list the tape devices and you can find the libr name there). 3)Determine whether you are using the default catagories or whether you have specified catagories. The default catagories in TSM are 300 and 301 and can be found via a Q libr command. If you are using the defaults, the numbers in hex that you will need below are for scratch 012E and for private is hex 012C. If you are using the defaults, go to the next step now. If you are NOT using the defaults, you will have to convert the number shown on the Q libr command for scratch and private from decimal to hex. When you convert the number for scratch, you must add 1 to it before you convert. (This is because Q libr shows catagory for 3490 and catagory for 3590 is one higher). 4)In TSM, do an sql statement to find all volumes in private and redirect that to a file: select volume_name from libvolumes where status='Private' > filename 5)At the AIX prompt, edit the file created in the above step to remove the header lines. There are usually at least 2 lines before the first volume is listed. The file should only contain volume names. 6)Verify that several of the items in your list are in fact in the incorrect catagory on the 3494 by issuing the following command against a couple of the volumes. replace vol_name with one of your volumes mtlib -l /dev/lmcp -qV -V vol_name for example mtlib -l /dev/lmcp0 -qV -V 000027 Look at the catagory listed in the ouput. If you are using the default, a private volume should be listed as 012C but you are probably seeing 012E. 7)Run the command to update the catagory to the private catagory. In the following command, you will replace the lmcp0 with whatever your 3494 libr name is on AIX. And you will replace filename with whatever you specified as a filename in the above command. Please note that you should fully qualify the file mtlib -l /dev/lmcp0 -C -L filename -t"012C" 8)Double check that a couple of the volumes were in fact correctly changed. I usually check at least the first and last item in the list. 9)Renable activity to tape library. How to avoid having to do this again: 1)Avoid running an audit library or halting and and restarting your server 2)If you must stop and restart your server, verify that your private volumes are in the correct catagory by taking one or two volumes and issuing the mtlib command against them to display the 3494 catagory. 3)If the problem has occurred again, run the procedure listed above. ------------ APAR IC31823 ------------ ABSTRACT: CANNOT UPGRADE FROM 3.7.4 TO 4.2 WITH LTO VOLUMES ERROR DESCRIPTION: Attempted upgrade from TSM 3.7.4 with a LTO library attached with LTO volumes in our db to TSM 4.2.0 fails with following message when trying to bring up the server: ANR9999D pvr.c(2449): ThreadId<0> Unsupported function for device class 20. ANR9999D asinit.c(341): ThreadId<0> Error converting volume attribs for devclass LTOCLASS. This problem was addressed in APAR IC29442 for the same issue when upgrading from 3.7.4 to 4.1. INITIAL IMPACT: High LOCAL FIX: Upgrade to 4.1.3 first (where fix for IC29942 is present). Then upgrade to 4.2. Upgrading to 4.1.3 might vary depending on platform. ------------ APAR IC31831 ------------ ABSTRACT: LTO DEVICES WON'T WORK AFTER UPGRADE TO 4.2.1 ERROR DESCRIPTION: If the TSM server is upgraded to 4.2.1 and has existing LTO de- vices, they will not work. They will fail with the following errors: ANR8337I LTO volume XXX mounted in drive DRIVE1 (/dev/rmt1). ANR8442E MOUNT REQUEST: Volume XXX in library 3584LTO is curr- ently in use. ANR1401W Mount request denied for volume XXX A PVR MMS trace shows the following: mmsscsi.c(8489): Preparing private volume XXX in library 3584LTO mmslib.c(7303): Obtaining reservation for volume XXX in library 3584LTO; activity=4. mmsscsi.c(1314): Problem verifying/reserving volume XXX is in library 3584LTO; rc = 4. pvr.c(8156): PVR I/O agent (37) finished OPEN request; rc=4. This only occurs after an upgrade to 4.2.1, a new install should not have a problem. INITIAL IMPACT: HIGH LOCAL FIX: N/A ------------ APAR IC31884 ------------ ABSTRACT: V4.2.1 SERVER CORE DUMPING, ANR7837S ON LOCKCYCLE02 ERROR DESCRIPTION: TSM Server V4.2.1 core dumps with ANR7837S Internal error LOCKCYCLE02 detected. The error messages are in the dsmserv.err Trace back in dsmserv.err: ANR7838S Server operation terminated. ANR7837S Internal error LOCKCYCLE02 detected. 0x100085A4 pkLogicAbort 0x100303F0 CheckLockCycles 0x100324C0 TmFindDeadlock 0x100322A4 TmDeadlockDetector 0x10006DB4 StartThread 0xD00081FC _pthread_body ANR7833S Server thread 1 terminated in response to program abort ANR7833S Server thread 2 terminated in response to program abort ............. The LOCKCYCLE02 indicates that the problem is related to transactions between the storage agent and TSM server. TSM set a waiter flag in the lock request. Situations can occur where the lock request is aborted. The abort causes a LOCKCYCLE02 problem because the deadlock detector woke up and went looking for waiters. Since there is a small window between when the abort code signals the lock waiter (because the mutex is released to allow the receiver to respond), this allowed the deadlock detector to start looking for deadlocks. Since the request had been satisfied by being aborted, there were no locks being waited on (but the flag was still set). Hence, TSM aborted because there was a waiter not waiting on anything. LOCAL FIX: Set the RESOURCETIMEOUT value in dsmserv.opt to a higher timeout value. (See the administrator reference for more information.) Higher timeout value should allow the locke waiter flag to clear. ------------ APAR IC31961 ------------ ABSTRACT: MOUNT RESERVATION ANR8447E (INTERNAL DEFECT 31934) ERROR DESCRIPTION: During the mount point processing, PVR will obtain information on drives only as it needs. Multiple calls to obtain the path information for the same drive may be issued. Which is an expected behaviour. There are instances where the "no drives available" message is correct -- all drives are in-use and there no need to force the dismount of an idle volume or to wait for a dismounting volume. ............ The problem is that when TSM is looking for idle volumes to steal, TSM never considered mount points in the reserved state (or mpClean state in TSM levels prior to V421). >>>>>>>>>>>>>>> For example: If the Mount limit is 4, the request is for one more mount point and there are - 1 reserved mount point - 2 open mount points - 1 idle mount point TSM would see that it did not need to force an idle dismount because 2 open mount points + 1 idle mount point + 1 new request is equal to 4. However the math should have included the 1 reserved mount point. .............. (This apar documents internal defect 31934) LOCAL FIX: N/A
