Re: [cisco-voip] Heartbeat Failure & SNRD

Daniel Pagan Tue, 10 Jun 2014 08:47:55 -0700

Just a quick wrap-up on this one...
Two defects created for this problem are CSCup27726 and CSCup27133.

- Dan

From: Wes Sisk (wsisk) [mailto:[email protected]]
Sent: Wednesday, May 21, 2014 2:50 PM
To: Daniel Pagan
Cc: [email protected]
Subject: Re: [cisco-voip] Heartbeat Failure & SNRD

Hi Daniel,

Great find!

For the document:
http://www.cisco.com/c/en/us/support/docs/voice-unified-communications/unified-communications-manager-callmanager/46806-cm-crashes-and-shutdowns.html

The initialization process and timers have changed *significantly* since 4.x. 
Some examples include:
CSCsj76788    cp-system request to remove initialization timers
"... remove the initialization timers that are started during CUCM 
initialization.  These timer would previously cause a system restart under 
certain circumstance..."

Still, there is a global maximum timeout. Individual Daemons must report start 
and successful initiation by that time.

Historically behavior like you discuss was triggered by service parameters 
being missing or having incorrect values. This may be a problem with connection 
to the database ( CSCsc72748 ) or problem with the contents of the database. 
Other problems include another process grabbing one of the TCP or UDP ports 
required by the ccm process.

ccm had many issues retrieving initialization information from the database in 
early linux versions. refinements to informix and in memory database (IMDB) 
have helped significantly.

-Wes

On May 21, 2014, at 9:33 AM, Daniel Pagan 
<[email protected]<mailto:[email protected]>> wrote:

Folks:

CUCM ES 8.6.2.24122-1 appears to be creating an issue where CallManager 
heartbeat fails to increment upon startup and the condition that must be met is 
very specific. On a problematic node, SDL traces show the following error 
exactly one hour after the start of the CCM service:

AppError  ||||||Local send blocked: SignalName: Start, DestPID: SNRD[1:100:61:1]

This error is followed by the SDL trace printing an error stating CallManager 
exceeded the permitted time for initialization and will restart the 
application. The CCM application restarts and additional SDL traces are printed 
showing the standard creation of critical processes - one hour later the same 
"Local send blocked" error is printed regarding the SNRD process.

I saw the DestPID: SNRD error, went to a completely different, non-problematic 
lab environment where 8.6.2.24122-1 is installed, created a single Remote 
Destination Profile, and then restarted the standalone node in order to force 
the creation of SNRD. CallManager heartbeats are now failing to increment in 
that environment and found another "Local send blocked" error regarding SNRD. 
Removing the single Remote Destination Profile from the standalone environment 
and rebooting the node resolves the problem. Re-inserting it again followed by 
a reboot recreates it, making SNRD the obvious culprit here.

I currently have a TAC case open where they're attempting to recreate the 
problem. It seems no public facing defects are created for this. Just wanted to 
give you folks a heads up.

Related to this, can someone tell me if this document, specifally the section 
describing MMManInit and process creation, is still accurate? If so, then what 
I fail to see in SDL traces is a InitDone signal from SNRD to MMManInit during 
the 60 minutes between CCM startup and initialization timeout.

- Daniel

_______________________________________________
cisco-voip mailing list
[email protected]<mailto:[email protected]>
https://puck.nether.net/mailman/listinfo/cisco-voip

_______________________________________________
cisco-voip mailing list
[email protected]
https://puck.nether.net/mailman/listinfo/cisco-voip

Re: [cisco-voip] Heartbeat Failure & SNRD

Reply via email to