The remote SCOM 2012 management servers have an avg. latency of 32ms (NC
based servers) and 125ms (UK based server).  Up until mid-July these
remote Mgmt servers were working fine.  As far as I know the network
bandwidth/technology has not changed since mid-July.

 

Also, in our SCOM 2007 environment we have always had geographically
based Mgmt servers without any issues, which included Gateways.  The
avg. latency on these is anywhere from 27ms up.

 

The main reason we setup geographically based management servers is in
case a network component, ie. router/switch is lost between our core
sites we won't get Alert notifications for those systems that have been
cutoff.  If we have all agents report back to, say, an Austin mgmt.
server and then all of a sudden we lose a router/switch in, say, the UK,
Alerts will go out that these agents are down, even though they are not.
This has happened in the past and those folks that get these alerts are
not happy when they are able to log on to these systems, but SCOM "says"
they are down.     

We also have not been able to configure a way to have SCOM 'recognize'
that a router/switch is down and prevent agent health alerts as such.
Other management systems, ie. Nagios, can do this, but so far we've not
seen anything that states SCOM can do it.

Another reason for geographically based management servers it to cut
down on all the traffic that agents can generate, so geographically
placed agents are assigned to their geographically based management
servers, then that traffic is reduced on the network.  My thinking on
this could be off a bit.

 

Thanks,

Sven

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Kevin Holman
Sent: Sunday, July 28, 2013 1:35 PM
To: [email protected]
Subject: [msmom] RE: SCOM 2012 Some Mgt Svrs stopped inserting data into
Ops DB.

 

What is the latency between the remote management servers, and the other
management servers in the same datacenter as the OpsMgr databases?

 

The assumption is that no management server should be deployed where
network latency is greater than 5ms.  There are VERY few WAN
technologies that can maintain less than 5ms latencies, therefore it is
not recommended to deploy management servers across geographic locations
EVER... *except* for some VERY specific disaster recovery scenarios, and
even those scenarios must maintain less than 20ms latency and require
some level of customization in order to maintain resource pool
resiliency and availability.

 

Is there a  specific reason why you deployed management servers across
geographic locations?  Even in SCOM 2007, this was not recommended.

 

From: [email protected]
[mailto:[email protected]] On Behalf Of Sven Wells
Sent: Friday, July 26, 2013 4:40 AM
To: [email protected]
Subject: [msmom] SCOM 2012 Some Mgt Svrs stopped inserting data into Ops
DB.

 

We noticed yesterday that three of our 12 Management Servers stopped
inserting data into the Ops DB at the same time a week ago.  There is no
Performance View data for these management servers, or their assigned
agents, after 18 Jul.

 

We are looking through logs to see what, if anything, occurred on that
date.

 

These three Management Servers are located geographically in different
places (2 in NC, 1 in Cambridge, UK) from the other 9 MSes.  The other 9
MSes are all located in Austin, TX.  All of our MSes are virtual.  The
Operations and Datawarehouse DBs are located in Austin, TX.

 

We recently attempted to create a Resource Pool which included the 3
geographically different MSes, and one or two of the Austin MSes, but
that did not seem to work as the Resource Pool would stop heartbeating.
After some advice that a Resource Pool should contain same-geographic
MSes, we removed those 3 MSes and replaced them with Austin MSes.

 

All Management Servers are showing Green/Healthy in the console.  All
management servers were working fine until 18 Jul 2013.

 

The agents assigned to those three management servers are all showing
Event ID 2120:

Event Type:         Warning

Event Source:     HealthService

Event Category:                Health Service 

Event ID:              2120

Date:                     7/19/2013

Time:                     7:18:07 AM

User:                     N/A

Computer:          XXXXXXX

Description:

The Health Service has deleted one or more items for management group
"XXXX" which could not be sent in 1440 minutes.

 

For more information, see Help and Support Center at
http://go.microsoft.com/fwlink/events.asp.

 

This event is occurring every hour and has been since 18 July 2013.

 

Any thoughts as to why these three Management Servers, one which has
been completely rebooted, the other to have just had all of their System
Center services stopped/restarted yesterday, have stopped inserting
data?

 

Thanks,

Sven

 

 

Sven Wells
SYSTEMS ADMINISTRATION SPECIALIST
TECHNOLOGY AND LABORATORY SVCS
Wilmington NC HQ

PPD

Phone +1 910 558 6870
[email protected]
www.ppdi.com
<http://www.ppdi.com/> 

 

 


This email transmission and any documents, files or previous email
messages attached to it may contain information that is confidential or
legally privileged. 
If you are not the intended recipient or a person responsible for
delivering this transmission to the intended recipient, you are hereby
notified 
that you must not read this transmission and that any disclosure,
copying, printing, distribution or use of this transmission is strictly
prohibited. 
If you have received this transmission in error, please immediately
notify the sender by telephone or return email and delete the original
transmission and its attachments without reading or saving in any
manner.

 

 


This email transmission and any documents, files or previous email messages 
attached to it may contain information that is confidential or legally 
privileged. 
If you are not the intended recipient or a person responsible for delivering 
this transmission to the intended recipient, you are hereby notified 
that you must not read this transmission and that any disclosure, copying, 
printing, distribution or use of this transmission is strictly prohibited. 
If you have received this transmission in error, please immediately notify the 
sender by telephone or return email and delete the original transmission and 
its attachments without reading or saving in any manner.



Reply via email to