Is there anyway to troubleshoot the disks to see which one is on defect?

Regards

Em seg, 3 de dez de 2018 às 15:08, Ryan Ratliff (rratliff) <
rratl...@cisco.com> escreveu:

> #1  0x044a9935 in raise () from /lib/tls/libc.so.6
> #2  0x044ab399 in abort () from /lib/tls/libc.so.6
> #3  0x0842e457 in preabort () at ProcessCMProcMon.cpp:80
> #4  0x0842fe7c in CMProcMon::verifySdlRouterServices () at
> ProcessCMProcMon.cpp:720
>
>
> The ccm process is killing itself because it isn’t getting enough
> resources.
>
> Nov 29 17:26:12 CMBL-03-01 local7 2 : 1: CMBL-03-01.localdomain: Nov 29
> 2018 19:26:12.340 UTC :  %UC_CALLMANAGER-2-CallManagerFailure:
> %[HostName=CMBL-03-01][IPAddress=192.168.183.3][Reason=4][Text=CCM
> Intentional Abort: SignalName: SIPSetupInd, DestPID:
> SIPD[1:100:67:7]][AppID=Cisco
> CallManager][ClusterID=StandAloneCluster][NodeID=CMBL-03-01]: Indicates an
> internal failure in Unified CM
>
>
> So much good info in the syslog.
> Here’s a super-useful tidbit.
>
> Nov 28 03:59:23 CMBL-03-01 local7 2 : 1543: CMBL-03-01.localdomain: Nov 28
> 2018 05:59:23.840 UTC :  %UC_RTMT-2-RTMT_ALERT: 
> %[AlertName=CallProcessingNodeCpuPegging][AlertDetail=
> Processor load over configured threshold for configured duration of time .
> Configured high threshold is 90 % tomcat (2 percent) uses most of the CPU.
>
>  Processor_Info:
>
>  For processor instance 1: %CPU= 99, %User= 2, %System= 2, %Nice= 0,
> %Idle= 0, %IOWait= 97, %softirq= 0, %irq= 0.
>
>  For processor instance _Total: %CPU= 93, %User= 2, %System= 1, %Nice= 0,
> %Idle= 7, %IOWait= 90, %softirq= 0, %irq= 0.
>
>  For processor instance 0: %CPU= 86, %User= 2, %System= 1, %Nice= 0,
> %Idle= 14, %IOWait= 83, %softirq= 0, %irq= 0.
>
>  For processor instance 3: %CPU= 87, %User= 2, %System= 2, %Nice= 0,
> %Idle= 13, %IOWait= 83, %softirq= 0, %irq= 0.
>
>  For processor instance 2: %CPU= 99, %User= 4, %System= 1, %Nice= 0,
> %Idle= 0, %IOWait= 96, %softirq= 0, %irq= 0.
>  ][AppID=Cisco AMC Service][ClusterID=][NodeID=CMBL-03-01]: RTMT Alert
>
>
> Looking back just a bit further, and there are a TON of these.
>
> Nov 15 21:22:00 CMBL-03-01 local7 2 : 582: CMBL-03-01.localdomain: Nov 15
> 2018 23:22:00.256 UTC :  %UC_RTMT-2-RTMT_ALERT: %[
> AlertName=HardwareFailure][AlertDetail=     At Thu Nov 15 21:22:00 BRST
> 2018 on node 192.168.183.3, the following HardwareFailure events generated:
>  hwStringMatch : Nov 15 21:21:26 CMBL-03-01 daemon 4 Director Agent: 
> LSIESG_DiskDrive_Modified
> 500605B0027C6D50 Command timeout on PD 01(e0xfc/s1) Path
> 500000e116ac4ce2, CDB: 2a 00 10 98 b9 9d 00 00 08 00 Sev: 3. AppID : Cisco
> Syslog Agent ClusterID :  NodeID : CMBL-03-01  TimeStamp : Thu Nov 15
> 21:21:26 BRST 2018   hwStringMatch : Nov 15 21:21:26 CMBL-03-01 daemon 4
> Director Agent: LSIESG_AlertIndication 500605B0027C6D50 Command timeout on
> PD 01(e0xfc/s1) Path 500000e116ac4ce2, CDB: 2a 00 10 98 b9 9d 00 00 08 00
> Sev: 3. AppID : Cisco Syslog Agent ClusterID :  NodeID : CMBL-03-01
>  TimeStamp : Thu Nov 15 21:21:27 BRST 2018   hwStringMatch : Nov 15
> 21:21:26 CMBL-03-01][AppID=Cisco AMC
> Service][ClusterID=][NodeID=CMBL-03-01]: RTMT Alert
>
>
> You’ve lost or are in the middle of losing at least one disk drive. It
> probably lost them all at the same time on the 13th and the OS marked the
> entire filesystem readonly.
>
> -Ryan
>
> On Dec 3, 2018, at 9:28 AM, Nilson Costa <nilsonl...@gmail.com> wrote:
>
> Hello All,
>
> I´m deploying a new CUCM on a customer that has an old one working just as
> call routing for a Genesys system for call center.
>
> As you can see the picture below, they have some MGCP Gateways connected
> to this CUCM where the calls come in and via some CTI route points,
> controlled by Genesys, route the call to to 2 Avaya PBX or to a another CUCM
>
> <image.png>
> On november 13th they lost access to Tomcat on the Publisher, when we
> looked at the server several services were restarting including Cisco
> CallManager, just on the Publisher.
> We decided to reboot the whole cluster, but after the reboot we are facing
> some wierd issues that are not that relevant, I think, but there is one
> which we are really worried
>
> The Cisco CallManager process are still restarting ramdomly and generating
> some coredumps I´m attaching this logs here also I´m attaching the syslogs
> from the publisher.
>
> Can anybody here on the group help me finding out what is triggering the
> Cisco CallManager restart?
>
> --
> Nilson Lino da Costa Junior
> <coredump.txt><publiser-syslog-29-11.txt>
> _______________________________________________
> cisco-voip mailing list
> cisco-voip@puck.nether.net
> https://puck.nether.net/mailman/listinfo/cisco-voip
>
>
>

-- 
Nilson Lino da Costa Junior
_______________________________________________
cisco-voip mailing list
cisco-voip@puck.nether.net
https://puck.nether.net/mailman/listinfo/cisco-voip

Reply via email to