I worked on a network move for a brokerage company last week and 
encountered a VERY strange problem.

We moved a bunch of equipment to a new office building.  During the 
process, we changed the internal network from 192.168.100.0/24 to 
172.31.4.0/22.
There company has 4 Cisco 3500XL 48 port switches, with no VLANs and plain 
vanilla configurations.  The fanciest thing is portfast on the client 
machine ports.
Switches are linked via GBICs in a cascade.  There is one client maintained 
router that sits before the firewall with only static routes and no routing 
protocols.
There are multiple outside vendor routers for specific applications 
(real-time quotes, clearinghouse mainframe, etc.), but these too also have 
only static routes and no routing protocols.

After installing all of the network equipment and servers, we started to 
turn on clients and get new DHCP addresses.  Since the new network was 
172.31.4.0/22, 172.31.4.1 - 172.31.4.255 was reserved for servers, 
printers, switches, and routers.  The remaining 172.31.5.0 - 172.31.7.254 
was reserved for clients...though there are only about 100 clients at the 
moment and thus they only took 5.0 - 5.100 or so in DHCP.

After installing maybe 20 clients or so, we started to see mass slowdowns 
on the network.  Pings between clients and servers were very irregular and 
intermittent.  There was no discernable pattern to when pings would succeed 
and when they'd fail.  We exhaustively went through all devices and made 
sure that they'd been correctly set to the new mask and that all server 
functions (DNS, WINS, AD, etc.) had been correctly setup for the new 
subnet.  Everything looked fine.  In an effort to troubleshoot, we unhooked 
the switch stack and put core servers and a few clients on a single 
switch.  Again, communication was irregular and unpredictable, whether with 
static or DHCP addresses on the clients.  Sometimes things would be fine, 
other times clients could ping the server, but not the switch to which they 
were attached.  Sometimes clients could ping the switch, but not the 
server.  Sometimes the clients could ping neither.  Again, there seemed to 
be no pattern.  Thinking there might have been some IOS bug, we erased 
nvram, upgraded the switches to current IOS code, and put in a completely 
plain configuration.  This had no effect on the problem.

After 4 of us (with probably 50 years of industry experience between us) 
spent 15 hours or so trying to resolve the issue, I finally suggested we 
try moving the clients from the 172.31.5.x/22 block to the 172.31.4.x/22 
block.  This solved all problems, and all clients were able to ping both 
switches and servers 100% of the time.  Again, we didn't change the mask on 
anything, only the third octet of the client ip range.  We then went back 
and triple checked every device attached to the network....servers, 
routers, switches, printers, clients, etc.  Every single device had the 
correct mask (/22) except for two vendor maintained UNIX boxes...they had 
172.31.4.x/24.  We suspected as much earlier since clients couldn't 
communicate with the UNIX boxes from the beginning, but the other servers 
could communicate with the UNIX boxes without issue.  These UNIX servers 
weren't running RIP(or any other RP)...and besides, there aren't any other 
network devices listening for RIP....so we weren't really concerned about 
them causing the network connectivity issues.  At the time, I couldn't see 
how a bad mask on these boxes could effectively make the whole network 
unusable, so I didn't bother correcting it early in the day.

At this point, I've had a week to think about the issue and I still don't 
have a logical reason for why this problem might have occurred.  Anyone out 
there have any thoughts?
I'm going back to put in a 3550EMI as the core in a couple of weeks.  At 
that point, we're going to investigate more and try to move the clients 
back to the 172.31.5.x range.  I'd like to test theories at that time if 
anyone can put one forward that we didn't already test....as I said, we 
spent a lot of time on this and I didn't put every test we did in this 
email.  All I can offer is that it wasn't IOS code (we tried more than one 
version), it wasn't the switches (we tried several, including non-Cisco), 
it wasn't DNS, WINS, DHCP, or any other server side issue (we thoroughly 
examined and ruled those out...beside, this was even happening at the IP 
level between switches).  Everything had worked correctly at the old 
building...the only two things that changed significantly during the move 
were the IP range and the building wiring.  AND, the wiring in the new 
building was brand new Cat6...I even dug out the WireScope and verified 
that the drops passed spec.

Thanks!
Craig




Message Posted at:
http://www.groupstudy.com/form/read.php?f=7&i=59682&t=59682
--------------------------------------------------
FAQ, list archives, and subscription info: http://www.groupstudy.com/list/cisco.html
Report misconduct and Nondisclosure violations to [EMAIL PROTECTED]

Reply via email to