Cluster is in a specific VLAN, the new switch doesn't have that VLAN sourced to it yet, but it will after a change. I want to try that first, before we move further.
I did catch a few things on google about NC373i NIC's and network issues. The only other thing I can see is that the NIC firmware and the BIOS firmware needs an update on each node ( about one revision behind.) But both servers connected into the same switch, seeing the same problem at the same time, doesn't sound like a server issue to me, at least not right now. Z Edward E. Ziots CISSP, Network +, Security + Network Engineer Lifespan Organization Email:[email protected] Cell:401-639-3505 From: William Robbins [mailto:[email protected]] Sent: Friday, February 18, 2011 11:46 AM To: NT System Admin Issues Subject: Re: Sounding board on issue we are seeing with a Windows 2003 Cluster with SQL 2005 I agree I would think if it were a physical NIC problem you could see the dropped packets... So your private NIC's are direct connected? Are you running a trace on both machines watching the private NIC's? The specific timeframe of the disconnect also doesn't lend itself to a NIC problem either. Hopefully you can change switches to see if that changes anything. Any hopes to get the cluster on an isolated switch? Is it in a specific VLAN now? - WJR On Fri, Feb 18, 2011 at 10:35, Ziots, Edward <[email protected]> wrote: Not going to tell anymore STFU, its why I am asking for a sounding board, right now I am at whits end, I also agree on the switch issue, I ran across a few internet posts complaining about NC373i and HP Broadcomm NIC's and lost packets, and I got some action items to update the BIOS on the server and the NIC Firmware to the latest support version and see if that helps. But I would defintely like to try moving to another switch first to eliminate the switch as the issue. Here is the real kicker though, if it was a NIC issue ( Physical NIC issue), then wouldn't I also see the dropped packets on the private NIC, which we didn't see. ( Even though they are connected via a cross over cable) Also I replaced the primary NIC cables, and verified the other cables are fine ( cable tester), so the only thing I could say right now is either it's a NIC issue not showing itself to me. ( I am not sure how you could add another NIC to the Server and then make it the Public NIC, without breaking the cluster itself, or bring the clustered Groups down. Z Edward E. Ziots CISSP, Network +, Security + Network Engineer Lifespan Organization Email:[email protected] <mailto:email%[email protected]> Cell:401-639-3505 -----Original Message----- From: William Robbins [mailto:[email protected]] Sent: Friday, February 18, 2011 10:33 AM To: NT System Admin Issues Subject: Re: Sounding board on issue we are seeing with a Windows 2003 Cluster with SQL 2005 Also that's a very specific timeframe...even if it's not backups on the cluster, could there be a backup or scheduled task on another server on the same switch in that timeframe? Feel free to tell me to STFU...I'm just spitballing. :) - WJR On Fri, Feb 18, 2011 at 07:48, Ziots, Edward <[email protected]> wrote: > I have a two node X64bit Windows 2003 SP2 enterprise edition cluster running > SQL 2005 Standard Edition 64bit. > > > > What I am seeing is event ID's 1123, 1124 in the event logs on each Cluster > Node, and we are getting complaints of disconnects from the database. > > > > We are seeing it happen around 5:50-6:00pm each night. ( shows in the > cluster log and we seen it via pings) > > > > 1) We have eliminated the backup of the server, which happens at 3:30am > in the morning ( via Legato) > > 2) I have gone through with Microsoft Support the entire KB 892422. > Which covers these errors. > > 3) I have switched out the cables to the public and the private NIC's > with no change in issues. > > 4) RSS/TCP Chimney are disabled in the registry and on the NIC's on > each node. > > 5) NIC Drivers are the latest from HP Site ( NC373i) and EMC Powerpath > software 5.3 SP1 for the SAN disk on each node. > > > > Basically we are pinging the Owning Node server from our workstations and we > loose about 5-10 pings during this time, on both the primary and the > secondary nodes of the cluster. ( both are into the same Cisco Switch 45xx) > > > > We also was pinging each of the servers from each other ( both on the same > switch/VLAN) and we also saw the ping loss at the same time. > > > > Only idea I had is to move the public NIC's to another switch to eliminate > the switch as the point of contention, or get new hardware and migrate the > databases off this cluster and decommission it. > > > > I checked other cluster nodes connected to these switches ( 32bit) and we > don't see this problem. > > > > Anything I might be missing or overlooked? Questions, or bouncing some ideas > off the wall is appreciated... > > > > Z > > > > Edward E. Ziots > > CISSP, Network +, Security + > > Network Engineer > > Lifespan Organization > > Email:[email protected] <mailto:email%[email protected]> > > Cell:401-639-3505 > > > > ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ > ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ > > --- > To manage subscriptions click here: > http://lyris.sunbelt-software.com/read/my_forums/ > or send an email to [email protected] > with the body: unsubscribe ntsysadmin ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to [email protected] with the body: unsubscribe ntsysadmin ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to [email protected] with the body: unsubscribe ntsysadmin ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to [email protected] with the body: unsubscribe ntsysadmin ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ <http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/> ~ --- To manage subscriptions click here: http://lyris.sunbelt-software.com/read/my_forums/ or send an email to [email protected] with the body: unsubscribe ntsysadmin
