Thanks Ryan, I´ll check it out Em seg, 3 de dez de 2018 às 15:22, Ryan Ratliff (rratliff) < rratl...@cisco.com> escreveu:
> There most certainly will be, but what it is depends on the hardware. It’s > a server so there will be some type of management agent that has probably > been trying to get somebody’s attention for a few weeks. > > I wouldn’t be surprised if there is a non-green led or two if you look at > the disks themselves. > > -Ryan > > On Dec 3, 2018, at 12:18 PM, Nilson Costa <nilsonl...@gmail.com> wrote: > > Is there anyway to troubleshoot the disks to see which one is on defect? > > Regards > > Em seg, 3 de dez de 2018 às 15:08, Ryan Ratliff (rratliff) < > rratl...@cisco.com> escreveu: > >> #1 0x044a9935 in raise () from /lib/tls/libc.so.6 >> #2 0x044ab399 in abort () from /lib/tls/libc.so.6 >> #3 0x0842e457 in preabort () at ProcessCMProcMon.cpp:80 >> #4 0x0842fe7c in CMProcMon::verifySdlRouterServices () at >> ProcessCMProcMon.cpp:720 >> >> >> The ccm process is killing itself because it isn’t getting enough >> resources. >> >> Nov 29 17:26:12 CMBL-03-01 local7 2 : 1: CMBL-03-01.localdomain: Nov 29 >> 2018 19:26:12.340 UTC : %UC_CALLMANAGER-2-CallManagerFailure: >> %[HostName=CMBL-03-01][IPAddress=192.168.183.3][Reason=4][Text=CCM >> Intentional Abort: SignalName: SIPSetupInd, DestPID: >> SIPD[1:100:67:7]][AppID=Cisco >> CallManager][ClusterID=StandAloneCluster][NodeID=CMBL-03-01]: Indicates an >> internal failure in Unified CM >> >> >> So much good info in the syslog. >> Here’s a super-useful tidbit. >> >> Nov 28 03:59:23 CMBL-03-01 local7 2 : 1543: CMBL-03-01.localdomain: Nov >> 28 2018 05:59:23.840 UTC : %UC_RTMT-2-RTMT_ALERT: >> %[AlertName=CallProcessingNodeCpuPegging][AlertDetail= >> Processor load over configured threshold for configured duration of time . >> Configured high threshold is 90 % tomcat (2 percent) uses most of the >> CPU. >> Processor_Info: >> >> For processor instance 1: %CPU= 99, %User= 2, %System= 2, %Nice= 0, >> %Idle= 0, %IOWait= 97, %softirq= 0, %irq= 0. >> >> For processor instance _Total: %CPU= 93, %User= 2, %System= 1, %Nice= 0, >> %Idle= 7, %IOWait= 90, %softirq= 0, %irq= 0. >> >> For processor instance 0: %CPU= 86, %User= 2, %System= 1, %Nice= 0, >> %Idle= 14, %IOWait= 83, %softirq= 0, %irq= 0. >> >> For processor instance 3: %CPU= 87, %User= 2, %System= 2, %Nice= 0, >> %Idle= 13, %IOWait= 83, %softirq= 0, %irq= 0. >> >> For processor instance 2: %CPU= 99, %User= 4, %System= 1, %Nice= 0, >> %Idle= 0, %IOWait= 96, %softirq= 0, %irq= 0. >> ][AppID=Cisco AMC Service][ClusterID=][NodeID=CMBL-03-01]: RTMT Alert >> >> >> Looking back just a bit further, and there are a TON of these. >> >> Nov 15 21:22:00 CMBL-03-01 local7 2 : 582: CMBL-03-01.localdomain: Nov 15 >> 2018 23:22:00.256 UTC : %UC_RTMT-2-RTMT_ALERT: %[ >> AlertName=HardwareFailure][AlertDetail= At Thu Nov 15 21:22:00 BRST >> 2018 on node 192.168.183.3, the following HardwareFailure events generated: >> hwStringMatch : Nov 15 21:21:26 CMBL-03-01 daemon 4 Director Agent: >> LSIESG_DiskDrive_Modified >> 500605B0027C6D50 Command timeout on PD 01(e0xfc/s1) Path >> 500000e116ac4ce2, CDB: 2a 00 10 98 b9 9d 00 00 08 00 Sev: 3. AppID : Cisco >> Syslog Agent ClusterID : NodeID : CMBL-03-01 TimeStamp : Thu Nov 15 >> 21:21:26 BRST 2018 hwStringMatch : Nov 15 21:21:26 CMBL-03-01 daemon 4 >> Director Agent: LSIESG_AlertIndication 500605B0027C6D50 Command timeout on >> PD 01(e0xfc/s1) Path 500000e116ac4ce2, CDB: 2a 00 10 98 b9 9d 00 00 08 00 >> Sev: 3. AppID : Cisco Syslog Agent ClusterID : NodeID : CMBL-03-01 >> TimeStamp : Thu Nov 15 21:21:27 BRST 2018 hwStringMatch : Nov 15 >> 21:21:26 CMBL-03-01][AppID=Cisco AMC >> Service][ClusterID=][NodeID=CMBL-03-01]: RTMT Alert >> >> >> You’ve lost or are in the middle of losing at least one disk drive. It >> probably lost them all at the same time on the 13th and the OS marked the >> entire filesystem readonly. >> >> -Ryan >> >> On Dec 3, 2018, at 9:28 AM, Nilson Costa <nilsonl...@gmail.com> wrote: >> >> Hello All, >> >> I´m deploying a new CUCM on a customer that has an old one working just >> as call routing for a Genesys system for call center. >> >> As you can see the picture below, they have some MGCP Gateways connected >> to this CUCM where the calls come in and via some CTI route points, >> controlled by Genesys, route the call to to 2 Avaya PBX or to a another CUCM >> >> <image.png> >> On november 13th they lost access to Tomcat on the Publisher, when we >> looked at the server several services were restarting including Cisco >> CallManager, just on the Publisher. >> We decided to reboot the whole cluster, but after the reboot we are >> facing some wierd issues that are not that relevant, I think, but there is >> one which we are really worried >> >> The Cisco CallManager process are still restarting ramdomly and >> generating some coredumps I´m attaching this logs here also I´m attaching >> the syslogs from the publisher. >> >> Can anybody here on the group help me finding out what is triggering the >> Cisco CallManager restart? >> >> -- >> Nilson Lino da Costa Junior >> <coredump.txt><publiser-syslog-29-11.txt> >> _______________________________________________ >> cisco-voip mailing list >> cisco-voip@puck.nether.net >> https://puck.nether.net/mailman/listinfo/cisco-voip >> >> >> > > -- > Nilson Lino da Costa Junior > > > -- Nilson Lino da Costa Junior
_______________________________________________ cisco-voip mailing list cisco-voip@puck.nether.net https://puck.nether.net/mailman/listinfo/cisco-voip