hi, I have been seeing a very high number of supervisor 720 (WS-SUP720) crashes in many customer's environment. Bassically the SP stops receiving the heart beats from RP.
Following error is very common reasons seen sometimes for SP and sometimes for RP. For SP %CPU_MONITOR-SP-6-NOT_HEARD: CPU_MONITOR messages have not been heard for 150 seconds [6/1] %CPU_MONITOR-SP-3-TIMED_OUT: CPU_MONITOR messages have failed, resetting system [6/1] For RP %CPU_MONITOR-6-NOT_HEARD: CPU_MONITOR messages have not been heard for %d seconds [%d/%d] CPU monitor messages have not been detected for a significant amount of time. [dec] is the number of seconds. A timeout is likely to occur soon, which will reset the system. This error can be caused by a badly seated module or by high traffic in the EOBC channel. *Recommended Action: *Verify that all modules are seated properly in the chassis. Pull out the module mentioned in the message and inspect the backplane and module for bent pins or hardware damage. If the message persists after reseating all the modules, a hardware problem may exist, such as a defective module or chassis. Is this common problem that anybody also seeing in their 6500s with sup720? Is this a common hard defect with EOBC channel that blocks the communication between RP and SP? If so what are the preventive actions ?? Krunal _______________________________________________ cisco-nsp mailing list [email protected] https://puck.nether.net/mailman/listinfo/cisco-nsp archive at http://puck.nether.net/pipermail/cisco-nsp/
