Hello, We started using the Unix/Linux scom agents when we installed SCOM 2012 SP1. We've recently upgraded to SCOM 2012 R2, agents too, and have started to see a possible issue with the Unix/Linux scom agent heartbeat monitors.
The Windows health service watcher monitor provides some diagnostic and recovery tasks so that if a scom agent on a Windows server stops heartbeating, Diagnostics dictate that the computer is pinged and if those ping requests go unanswered, then the Recovery to set the "Computer Not Reachable" monitor is run. The "Computer Not Reachable" monitor then produces a "Failed to Connect to Computer" alert which is then acted upon. Using the "Computer Not Reachable" monitor, we were able to forward just those alerts to the Windows admin, informing them when a system was truly unreachable. With the Unix/Linux heartbeat monitor, this does not seem possible to do. So when the Unix/Linux admins receive a "heartbeat failed" alert, they check the system and find that the system is not down (or unreachable), and are puzzled why they've received a "false/positive" alert. We realize that "heartbeat failed" alerts can be caused by a momentary/long term network glitch, the scom agent stopping, etc., but none of these means the actual server is down/unreachable. So, is there a diagnostic/recovery process contained within the Unix/Linux Heartbeat monitor that acts/works like the Windows Health Service Watcher? I see a WS-Management Heartbeat ICMP Diagnostic task built into the Unix/Linux Heartbeat Monitor, however, it seems that this diagnostic task is part of the heartbeat communications for this monitor, and not a separate component/functionality. We are toying with either building a Ping MP/monitor, or importing one already built (ie. Opslogix), then we would disable the Heartbeat failed alerts, but keep the agent performance collection functionality. We had this type of scenario when we were running SCOM 2007 SP1/R2, but that was mostly because those versions of SCOM did not have Unix/Linux agents. Any ideas? Thanks, Sven Sven Wells PRINCIPAL SYSTEMS ADMINISTRATOR Communication and Infrastructure Services TIP - Technology, Innovation and Performance PPD Wilmington NC HQ Phone +1 910 558 6870 sven.we...@ppdi.com <mailto:sven.we...@ppdi.com>www.ppdi.com <http://www.ppdi.com/> PPD LSS Yellow Belt [cid:image016.png@01CF177C.C1E1FBB0] This email transmission and any documents, files or previous email messages attached to it may contain information that is confidential or legally privileged. If you are not the intended recipient or a person responsible for delivering this transmission to the intended recipient, you are hereby notified that you must not read this transmission and that any disclosure, copying, printing, distribution or use of this transmission is strictly prohibited. If you have received this transmission in error, please immediately notify the sender by telephone or return email and delete the original transmission and its attachments without reading or saving in any manner.