Yeah that might be a possibility, but I have 150+ connections to the SQL
Server at that time, from all over the place so to try and track down
the offending query if that is the issue, would take a lot of needle in
the haystack work. 

 

Again, 23hr and 55mins of the day the system is rock solid, with no
issues, why would a TSQL query cause the network to barf, especially
when each system would need to flood a 1GBPS switch port connection, (
basically 125MBp/s)

 

Plus I would be seeing lockdown/blocking in the database at that time
and its quiet on that reguard

 

Z

 

Edward E. Ziots

CISSP, Network +, Security +

Network Engineer

Lifespan Organization

Email:[email protected]

Cell:401-639-3505

 

From: Jonathan Link [mailto:[email protected]] 
Sent: Friday, February 18, 2011 2:44 PM
To: NT System Admin Issues
Subject: Re: Sounding board on issue we are seeing with a Windows 2003
Cluster with SQL 2005

 

Pure speculation, but the time frame to me screams:

User runs a manual query that in their experience takes a long time to
process (they don't know why) so they set it to start as they leave for
the day, and then take action on the results the next day...


 

On Fri, Feb 18, 2011 at 8:48 AM, Ziots, Edward <[email protected]>
wrote:

I have a two node X64bit Windows 2003 SP2 enterprise edition cluster
running SQL 2005 Standard Edition 64bit. 

 

What I am seeing is event ID's 1123, 1124 in the event logs on each
Cluster Node, and we are getting complaints of disconnects from the
database. 

 

We are seeing it happen around 5:50-6:00pm each night.  ( shows in the
cluster log and we seen it via pings)

 

1)      We have eliminated the backup of the server, which happens at
3:30am in the morning ( via Legato)

2)      I have gone through with Microsoft Support the entire KB 892422.
Which covers these errors. 

3)      I have switched out the cables to the public and the private
NIC's with no change in issues. 

4)      RSS/TCP Chimney are disabled in the registry and on the NIC's on
each node. 

5)      NIC Drivers are the latest from HP Site ( NC373i) and EMC
Powerpath software 5.3 SP1 for the SAN disk on each node. 

 

Basically we are pinging the Owning Node server from our workstations
and we loose about 5-10 pings during this time, on both the primary and
the secondary nodes of the cluster. ( both are into the same Cisco
Switch 45xx)

 

We also was pinging each of the servers from each other ( both on the
same switch/VLAN) and we also saw the ping loss at the same time. 

 

Only idea I had is to move the public NIC's to another switch to
eliminate the switch as the point of contention, or get new hardware and
migrate the databases off this cluster and decommission it. 

 

I checked other cluster nodes connected to these switches ( 32bit) and
we don't see this problem. 

 

Anything I might be missing or overlooked? Questions, or bouncing some
ideas off the wall is appreciated...

 

Z

 

Edward E. Ziots

CISSP, Network +, Security +

Network Engineer

Lifespan Organization

Email:[email protected] <mailto:email%[email protected]> 

Cell:401-639-3505

 

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~

---
To manage subscriptions click here:
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to [email protected]
with the body: unsubscribe ntsysadmin

 

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~

---
To manage subscriptions click here:
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to [email protected]
with the body: unsubscribe ntsysadmin


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to [email protected]
with the body: unsubscribe ntsysadmin

Reply via email to