I agree I would *think* if it were a physical NIC problem you could see the
dropped packets...

So your private NIC's are direct connected?  Are you running a trace on both
machines watching the private NIC's?

The specific timeframe of the disconnect also doesn't lend itself to a NIC
problem either.  Hopefully you can change switches to see if that changes
anything.  Any hopes to get the cluster on an isolated switch?  Is it in a
specific VLAN now?

 - WJR


On Fri, Feb 18, 2011 at 10:35, Ziots, Edward <[email protected]> wrote:

> Not going to tell anymore STFU, its why I am asking for a sounding board,
> right now I am at whits end, I also agree on the switch issue, I ran across
> a few internet posts complaining about NC373i and HP Broadcomm NIC's and
> lost packets, and I got some action items to update the BIOS on the server
> and the NIC Firmware to the latest support version and see if that helps.
> But I would defintely like to try moving to another switch first to
> eliminate the switch as the issue.
>
> Here is the real kicker though, if it was a NIC issue ( Physical NIC
> issue), then wouldn't I also see the dropped packets on the private NIC,
> which we didn't see. ( Even though they are connected via a cross over
> cable)
>
> Also I replaced the primary NIC cables, and verified the other cables are
> fine ( cable tester), so the only thing I could say right now is either it's
> a NIC issue not showing itself to me. ( I am not sure how you could add
> another NIC to the Server and then make it the Public NIC, without breaking
> the cluster itself, or bring the clustered Groups down.
>
> Z
>
> Edward E. Ziots
> CISSP, Network +, Security +
> Network Engineer
> Lifespan Organization
> Email:[email protected]
> Cell:401-639-3505
>
>
> -----Original Message-----
> From: William Robbins [mailto:[email protected]]
> Sent: Friday, February 18, 2011 10:33 AM
> To: NT System Admin Issues
> Subject: Re: Sounding board on issue we are seeing with a Windows 2003
> Cluster with SQL 2005
>
> Also that's a very specific timeframe...even if it's not backups on
> the cluster, could there be a backup or scheduled task on another
> server on the same switch in that timeframe?
>
> Feel free to tell me to STFU...I'm just spitballing.  :)
>
>  - WJR
>
>
>
> On Fri, Feb 18, 2011 at 07:48, Ziots, Edward <[email protected]> wrote:
> > I have a two node X64bit Windows 2003 SP2 enterprise edition cluster
> running
> > SQL 2005 Standard Edition 64bit.
> >
> >
> >
> > What I am seeing is event ID's 1123, 1124 in the event logs on each
> Cluster
> > Node, and we are getting complaints of disconnects from the database.
> >
> >
> >
> > We are seeing it happen around 5:50-6:00pm each night.  ( shows in the
> > cluster log and we seen it via pings)
> >
> >
> >
> > 1)      We have eliminated the backup of the server, which happens at
> 3:30am
> > in the morning ( via Legato)
> >
> > 2)      I have gone through with Microsoft Support the entire KB 892422.
> > Which covers these errors.
> >
> > 3)      I have switched out the cables to the public and the private
> NIC's
> > with no change in issues.
> >
> > 4)      RSS/TCP Chimney are disabled in the registry and on the NIC's on
> > each node.
> >
> > 5)      NIC Drivers are the latest from HP Site ( NC373i) and EMC
> Powerpath
> > software 5.3 SP1 for the SAN disk on each node.
> >
> >
> >
> > Basically we are pinging the Owning Node server from our workstations and
> we
> > loose about 5-10 pings during this time, on both the primary and the
> > secondary nodes of the cluster. ( both are into the same Cisco Switch
> 45xx)
> >
> >
> >
> > We also was pinging each of the servers from each other ( both on the
> same
> > switch/VLAN) and we also saw the ping loss at the same time.
> >
> >
> >
> > Only idea I had is to move the public NIC's to another switch to
> eliminate
> > the switch as the point of contention, or get new hardware and migrate
> the
> > databases off this cluster and decommission it.
> >
> >
> >
> > I checked other cluster nodes connected to these switches ( 32bit) and we
> > don't see this problem.
> >
> >
> >
> > Anything I might be missing or overlooked? Questions, or bouncing some
> ideas
> > off the wall is appreciated...
> >
> >
> >
> > Z
> >
> >
> >
> > Edward E. Ziots
> >
> > CISSP, Network +, Security +
> >
> > Network Engineer
> >
> > Lifespan Organization
> >
> > Email:[email protected]
> >
> > Cell:401-639-3505
> >
> >
> >
> > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
> >
> > ---
> > To manage subscriptions click here:
> > http://lyris.sunbelt-software.com/read/my_forums/
> > or send an email to [email protected]
> > with the body: unsubscribe ntsysadmin
>
> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>
> ---
> To manage subscriptions click here:
> http://lyris.sunbelt-software.com/read/my_forums/
> or send an email to [email protected]
> with the body: unsubscribe ntsysadmin
>
>
> ~ Finally, powerful endpoint security that ISN'T a resource hog! ~
> ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~
>
> ---
> To manage subscriptions click here:
> http://lyris.sunbelt-software.com/read/my_forums/
> or send an email to [email protected]
> with the body: unsubscribe ntsysadmin
>
>

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/>  ~

---
To manage subscriptions click here: 
http://lyris.sunbelt-software.com/read/my_forums/
or send an email to [email protected]
with the body: unsubscribe ntsysadmin

Reply via email to