Hello,

We started using the Unix/Linux scom agents when we installed SCOM 2012 SP1.  
We've recently upgraded to SCOM 2012 R2, agents too, and have started to see a 
possible issue with the Unix/Linux scom agent heartbeat monitors.

The Windows health service watcher monitor provides some diagnostic and 
recovery tasks so that if a scom agent on a Windows server stops heartbeating, 
Diagnostics dictate that the computer is pinged and if those ping requests go 
unanswered, then the Recovery to set the "Computer Not Reachable"  monitor is 
run.  The "Computer Not Reachable" monitor then produces a "Failed to Connect 
to Computer" alert which is then acted upon.  Using the "Computer Not 
Reachable" monitor, we were able to forward just those alerts to the Windows 
admin, informing them when a system was truly unreachable.

With the Unix/Linux heartbeat monitor, this does not seem possible to do.  So 
when the Unix/Linux admins receive a "heartbeat failed" alert, they check the 
system and find that the system is not down (or unreachable), and are puzzled 
why they've received a "false/positive" alert.  We realize that "heartbeat 
failed" alerts can be caused by a momentary/long term network glitch, the scom 
agent stopping, etc., but none of these means the actual server is 
down/unreachable.  So, is there a diagnostic/recovery process contained within 
the Unix/Linux Heartbeat monitor that acts/works like the Windows Health 
Service Watcher?

I see a WS-Management Heartbeat ICMP Diagnostic task built into the Unix/Linux 
Heartbeat Monitor, however, it seems that this diagnostic task is part of the 
heartbeat communications for this monitor, and not a separate 
component/functionality.

We are toying with either building a Ping MP/monitor, or importing one already 
built (ie. Opslogix), then we would disable the Heartbeat failed alerts, but 
keep the agent performance collection functionality.  We had this type of 
scenario when we were running SCOM 2007 SP1/R2, but that was mostly because 
those versions of SCOM did not have Unix/Linux agents.

Any ideas?

Thanks,
Sven

Sven Wells
PRINCIPAL SYSTEMS ADMINISTRATOR
Communication and Infrastructure Services
TIP - Technology, Innovation and Performance

PPD
Wilmington NC HQ

Phone +1 910 558 6870
sven.we...@ppdi.com
<mailto:sven.we...@ppdi.com>www.ppdi.com
<http://www.ppdi.com/>
PPD LSS Yellow Belt

[cid:image016.png@01CF177C.C1E1FBB0]


This email transmission and any documents, files or previous email messages 
attached to it may contain information that is confidential or legally 
privileged. 
If you are not the intended recipient or a person responsible for delivering 
this transmission to the intended recipient, you are hereby notified 
that you must not read this transmission and that any disclosure, copying, 
printing, distribution or use of this transmission is strictly prohibited. 
If you have received this transmission in error, please immediately notify the 
sender by telephone or return email and delete the original transmission and 
its attachments without reading or saving in any manner.



Reply via email to