Hi all, This morning i noticed that our TSM failed to backup one of our nodes. A look at the activity log revealed following errors:
ANR9999D imgroup.c(1180): ThreadId<52> Error 8 retrieving Backup Objects row for object 0.11583818 (SESSION: 2736) Oct 25, 2005 11:05:09 PM ANR9999D ThreadId<52> issued message 9999 from: (SESSION: 2736) Oct 25, 2005 11:05:10 PM ANR9999D smnode.c(7343): ThreadId<52> Session 2736: Invalid Group Id 0,11583818 for ADD function (SESSION: 2736) Oct 25, 2005 11:05:10 PM ANR9999D ThreadId<52> issued message 9999 from: (SESSION: 2736) Oct 25, 2005 11:05:10 PM ANR0403I Session 2736 ended for node ***** (WinNT). (SESSION: 2736) Also, activity log shows that node has sent objects to the server but server reports that 0 objects has been backed up (probably as a result of above error). We have never had failed node backup with this error as a reason. Also at the same time, all other backups finished just fine.. While i was searching a adsm.org list for info on this error I stumbled upon this discussion. I wonder if anyone has discovered what this error actually means?!? We are using Tivoli Storage Manager: 5.3.2 on Win 2003 server. Thank you very much... ______________________________________ Branko Stanic System Administrator Ured registrara za ratne zlocine i organizovani kriminal Sarajevo, Bosna i Hercegovina +387 33 707 111 -----Original Message----- From: ADSM: Dist Stor Manager [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Raibeck Sent: 4. listopad 2005 15:28 To: [email protected] Subject: Re: [ADSM-L] Fw: HELP!!!! Joni, Not to be patronizing... but take a deep breath, then another, then a third, and relax. :-) It is very difficult to diagnose any problems when all one has is a vague (at best) problem description and a handful of various error messages. In identifying the source of the trouble, context is nearly (if not completely) everything. Without knowning anything else about this problem, I would recommend that you start by reviewing your activity log. That would be the ENTIRE log, not just certain messages. Start with the first of the ANR9999D error messages being issued, and work your way backward, trying to get a picture of what sessions, processes, and other events were running on the server at the time the problem started. Also try searching the IBM web site for instances of ANR9999D plus other keywords that appear in the message text (don't search on numbers that might be instance-specific, just search on non-numeric strings). If you search only on ANR9999D, you'll get way too many hits. If you can figure out what clients were running at the time this occurred, check their error and schedule logs to see what errors they received. What activities were they doing? I see you have a script running called NAS_2-DIFFERENTIAL. That is another event you can examine. Do this for all running sessions and processes. You might have to go several hours back in the activity log from the first of the ANR9999Ds, but this is a start. I'm not sure where that list of messages below came from, but those are shown in columns that are far too narrow. Consider querying the activity log from an Admin CLI started with the -commadelimited option and redirect the output to a file. You can then view the messages directly from the file or load them into a spreadsheet or database for easier reading. Regards, Andy Andy Raibeck IBM Software Group Tivoli Storage Manager Client Development Internal Notes e-mail: Andrew Raibeck/Tucson/[EMAIL PROTECTED] Internet e-mail: [EMAIL PROTECTED] IBM Tivoli Storage Manager support web page: http://www-306.ibm.com/software/sysmgmt/products/support/IBMTivoliStorag eManager.html The only dumb question is the one that goes unasked. The command line is your friend. "Good enough" is the enemy of excellence. "ADSM: Dist Stor Manager" <[email protected]> wrote on 2005-10-04 05:13:03: > I found some other errors as well, do any of these look familiar? > I've place a call with IBM, but I'm really at a loss here. For our > NDMP backups > it stated an issue with malloc, but I don't know where this problem is > beginning... > > Could I have an issue with the size of the bufferpool on this server? Here > are the options for this server and it has 4 processors and 8GB of memory. > > Server Option Option Setting Server Option Option Setting > > ----------------- -------------------- ----------------- > -------------------- > CommTimeOut 7,200 IdleTimeOut 360 > > BufPoolSize 419432 LogPoolSize 1024 > > MessageFormat 1 Language en_US > > Alias Halt HALT MaxSessions 500 > > ExpInterval 0 ExpQuiet No > > EventServer Yes MirrorRead DB Normal > > MirrorRead LOG Normal MirrorWrite DB Sequential > > MirrorWrite LOG Parallel VolumeHistory > /t01/volhist/hist01 > VolumeHistory /usr/tivoli/tsm/ser- Devconfig > /t01/devconfig/dev01 > ver/bin/volhist > > Devconfig /usr/tivoli/tsm/ser- TxnGroupMax 256 > > ver/bin/devconfig > > MoveBatchSize 1000 MoveSizeThresh 2048 > > StatusMsgCnt 10 RestoreInterval 1,440 > > UseLargeBuffers Yes DisableScheds No > > NOBUFPREfetch No AuditStorage Yes > > REQSYSauthoutfile Yes SELFTUNEBUFpools- Yes > > ize > > SELFTUNETXNsize Yes DBPAGEShadow No > > DBPAGESHADOWFile dbpgshdw.bdt MsgStackTrace On > > QueryAuth None LogWarnFullPerCe- 75 > > nt > > ThroughPutDataTh- 0 ThroughPutTimeTh- 0 > > reshold reshold > > NOPREEMPT ( No ) Resource Timeout 60 > > TEC UTF8 Events No NORETRIEVEDATE No > > DNSLOOKUP Yes > > TCPPort 1500 TcpAdminport 1500 > > HTTPPort 1580 TCPWindowsize 65536 > > TCPBufsize 16384 TCPNoDelay Yes > > CommMethod TCPIP CommMethod ShMem > > CommMethod HTTP MsgInterval 1 > > ShmPort 1510 FileExit > /t01/log/eventserve- > r(APPEND) > > UserExit FileTextExit > > AssistVCRRecovery Yes AcsAccessId chrs144 > > AcsTimeoutX 1 AcsLockDrive No > > AcsQuickInit No SNMPSubagentPort 1521 > > SNMPSubagentHost 127.0.0.1 SNMPHeartBeatInt 5 > > TECHost TECPort 0 > > UNIQUETECevents No UNIQUETDPTECeven- No > > ts > > Async I/O No Direct I/O Yes > > SHAREDLIBIDLE No 3494Shared No > > > > > DATE_TIME MSGNO MESSAGE > ------------------ ----------- ------------------ > 2005-10-03 9999 ANR9999D > 22:02:43.000000 imgroup.c(1180): > ThreadId<511> > Error 8 > retrieving Backup > Objects row for > object > 0.295703482 > Callchain of > previous message: > 0x0000000100017d- > 94 outDiagf <- > 0x00000001003dea- > d4 imIsGroupLead- > er <- 0x00000001- > 00385564 > SmNodeSession <- > 0x000000010043bb- > 38 HandleNodeSes- > sion <- > 0x00000001004419- > 64 smExecuteSess- > ion <- > 0x00000001004344- > 78 SessionThread > <- 0x00000001000- > 08078 StartThread > <- 0x09000000002- > f4460 _pthread_b- > ody <- (SESSION: > 40004) > 2005-10-03 9999 ANR9999D > 22:02:43.000000 smnode.c(7056): > ThreadId<511> > Session 40004: > Invalid Group Id > 0,295703482 for > ADD function > Callchain of > previous message: > 0x0000000100017d- > 94 outDiagf <- > 0x00000001003855- > 8c SmNodeSession > <- 0x00000001004- > 3bb38 HandleNode- > Session <- > 0x00000001004419- > 64 smExecuteSess- > ion <- > 0x00000001004344- > 78 SessionThread > <- 0x00000001000- > 08078 StartThread > <- 0x09000000002- > f4460 _pthread_b- > ody <- (SESSION: > 2005-10-03 8311 ANR8311E An I/O > 22:37:12.000000 error occurred > while accessing > drive SL8500 > (/dev/rmt7) for > LOCATE operation, > errno = 78. > (SESSION: 29020, > PROCESS: 680) > 2005-10-03 1165 ANR1165E Error > 22:37:13.000000 detected for file > in storage pool > TAPE_ORACLE: Node > FJSU102, Type > Backup, File > space /p01, fsId > 18, File name > /app/cyb/esp/MED- > -ESPSystemAgent/- > spool/CM_DEMO/MA- > IN/MEDAXBP.1366/ > VMSDBSAP. > (SESSION: 29020, > PROCESS: 680) > 2005-10-03 3523 ANR3523W GENERATE > 22:37:13.000000 BACKUPSET: > Retrieve failed > - error on input > storage device. > (SESSION: 29020, > PROCESS: 680) > 2005-10-03 3503 ANR3503E > 22:37:13.000000 Generation of > backup set for > FJSU102 as > FJSU102_BACKUPSE- > T.295467854 > failed. (SESSION: > 29020, PROCESS: > 680) > 2005-10-03 2032 ANR2032E GENERATE > 22:37:14.000000 BACKUPSET: > Command failed - > internal server > error detected. > (SESSION: 29020, > PROCESS: 680) > 2005-10-03 9999 ANR9999D > 22:50:53.000000 imgroup.c(1180): > ThreadId<437> > Error 8 > retrieving Backup > Objects row for > object > 0.295537065 > Callchain of > previous message: > 0x0000000100017d- > 94 outDiagf <- > 0x00000001003dea- > d4 imIsGroupLead- > er <- 0x00000001- > 00385564 > SmNodeSession <- > 0x000000010043bb- > 38 HandleNodeSes- > sion <- > 0x00000001004419- > 64 smExecuteSess- > ion <- > 0x00000001004344- > 78 SessionThread > <- 0x00000001000- > 08078 StartThread > <- 0x09000000002- > f4460 _pthread_b- > ody <- (SESSION: > 39214) > 2005-10-03 9999 ANR9999D > 22:50:53.000000 smnode.c(7056): > ThreadId<437> > Session 39214: > Invalid Group Id > 0,295537065 for > ADD function > Callchain of > previous message: > 0x0000000100017d- > 94 outDiagf <- > 0x00000001003855- > 8c SmNodeSession > <- 0x00000001004- > 3bb38 HandleNode- > Session <- > 0x00000001004419- > 64 smExecuteSess- > ion <- > 0x00000001004344- > 78 SessionThread > <- 0x00000001000- > 08078 StartThread > <- 0x09000000002- > f4460 _pthread_b- > ody <- (SESSION: > 39214) > 2005-10-03 423 ANR0423W Session > 22:51:42.000000 41306 for > administrator ( > ) refused - > administrator > name not > registered. > (SESSION: 41306) > 2005-10-03 8311 ANR8311E An I/O > 22:52:14.000000 error occurred > while accessing > drive SL8500 > (/dev/rmt7) for > OFFL operation, > errno = 78. > (SESSION: 29020, > PROCESS: 680) > 2005-10-03 8769 ANR8769E External > 23:34:45.000000 media management > function DISMOUNT > returned > result=LIBRARY_E- > RROR. (SESSION: > 29020, PROCESS: > 680) > 2005-10-03 8469 ANR8469E Dismount > 23:34:45.000000 of LTO volume > T00897 from drive > SL8500 > (/dev/rmt7) in > library SL8500 > failed. (SESSION: > 29020, PROCESS: > 680) > 2005-10-03 1410 ANR1410W Access > 23:34:46.000000 mode for volume > T00897 now set to > "unavailable". > (SESSION: 29020, > PROCESS: 680) > 2005-10-03 9999 ANR9999D > 23:46:25.000000 ssremote.c(503): > ThreadId<136> > Unable to open > remote session of > type 1. Callchain > of previous > message: > 0x0000000100017d- > 94 outDiagf <- > 0x00000001004a41- > 78 ssInitStoreRe- > mote <- > 0x000000010066ad- > 10 AfInitStoreRe- > mote <- > 0x00000001006670- > 24 bfInitStoreRe- > mote <- > 0x00000001006a05- > c4 DoBackup <- > 0x00000001006a3e- > 5c AdmBackupNode > <- 0x00000001001- > 63168 AdmCommand- > Local <- > 0x00000001001642- > ac admCommand <- > 0x000000010015b1- > 80 RunScript <- > 0x000000010015cd- > 30 DoRunScript <- > 0x00000001001631- > 68 AdmCommandLoc- > al <- 0x00000001- > 001642ac > admCommand <- > 0x000000010064ec- > 58 SmExecSchedul- > edCommand <- > 0x000000010064ee- > 54 smScheduledCo- > nsoleSession <- > 0x000000010064c8- > 60 CsRunCmdThread > <- 0x00000001000- > 08078 StartThread > <- 0x09000000002- > f4460 _pthread_b- > ody <- (SESSION: > 38192, PROCESS: > 963) > 2005-10-03 2032 ANR2032E BACKUP > 23:46:25.000000 NODE: Command > failed - internal > server error > detected. > (SESSION: 38192, > PROCESS: 963) > 2005-10-03 1463 ANR1463E RUN: > 23:46:25.000000 Command script > NAS_2-DIFFERENTI- > AL completed in > error. (SESSION: > 38192, PROCESS: > 963) > 2005-10-03 2752 ANR2752E Scheduled > 23:46:25.000000 command > NAS_2-DIFFERENTI- > AL failed. > (SESSION: 38192, > PROCESS: 963) > > ******************************** > Joni Moyer > Highmark > Storage Systems > Work:(717)302-6603 > Fax:(717)302-5974 > [EMAIL PROTECTED] > ******************************** > > > > "Richard Sims" > <[EMAIL PROTECTED]> > To > 10/04/2005 07:56 "Joni Moyer" > AM <[EMAIL PROTECTED]> > cc > > Subject > Re: HELP!!!! > > > > > > > > > > > Joni - That's an error we haven't seen before. > > Your best course of action is to call TSM Support. > > Richard Sims > > On Oct 4, 2005, at 7:32 AM, Joni Moyer wrote: > > > Has anyone ever seen this message before? I have TSM 5.2.4 running > > on AIX > > 5.2 and it seems like this error message occurred and then all > > processing stopped and it almost looks like TSM stopped & restarted > > itself. Any suggestions are appreciated!!!! I'm completely lost in > > this situation. > > Thank you in advance! > > > > 10/03/05 23:46:25 ANR9999D ssremote.c(503): ThreadId<136> > > Unable to > > open > > remote session of type 1. Callchain of > > previous > > message: > > 0x0000000100017d94 outDiagf <- > > 0x00000001004a4178 > > > > ssInitStoreRemote <- 0x000000010066ad10 > > AfInitStoreRemote > > <- 0x0000000100667024 bfInitStoreRemote <- > > 0x00000001006- > > a05c4 DoBackup <- 0x00000001006a3e5c > > AdmBackupNode > > <- > > 0x0000000100163168 AdmCommandLocal <- > > 0x00000001001642ac > > admCommand <- 0x000000010015b180 RunScript <- > > 0x00000001- > > 0015cd30 DoRunScript <- 0x0000000100163168 > > AdmCommandLoc- > > al <- 0x00000001001642ac admCommand <- > > 0x000000010064ec58 > > SmExecScheduledCommand <- 0x000000010064ee54 > > smScheduled- > > ConsoleSession <- 0x000000010064c860 > > CsRunCmdThread > > <- > > 0x0000000100008078 StartThread <- > > 0x09000000002f4460 > > > > _pthread_body <- (SESSION: 38192, PROCESS: > > 963) > > 10/03/05 23:46:25 ANR2032E BACKUP NODE: Command failed - internal > > server > > error detected. (SESSION: 38192, PROCESS: > > 963) > > > > 10/03/05 23:46:25 ANR2753I (NAS_2-DIFFERENTIAL):ANR2032E BACKUP > > NODE: > > > > Command failed - (SESSION: 38192) > > > > 10/03/05 23:46:25 ANR2753I (NAS_2-DIFFERENTIAL):internal server > > error > > > > detected. (SESSION: 38192) > > > > 10/03/05 23:46:25 ANR1463E RUN: Command script NAS_2-DIFFERENTIAL > > completed > > in error. (SESSION: 38192, PROCESS: 963) > > > > > > > > ******************************** > > Joni Moyer > > Highmark > > Storage Systems > > Work:(717)302-6603 > > Fax:(717)302-5974 > > [EMAIL PROTECTED] > > ******************************** > > > >
