192.168.1.0/24 in Seattle via T1 (384K data bandwidth) (that's where I am)
192.168.2.0/24 in Portland via T1 (768K data bandwidth)
192.168.3.0/24 in Boise via DSL (768K data bandwidth)
Two weeks ago, we had to close the Portland office so that router is no longer part of the network. About three weeks ago, the Boise network (three users, five desktops, two networked printers, a Linux fileserver, and a 12-port HP ProCurve 2424 switch) started dropping packets, no big deal to start with but the users noticed that in the mornings it took a long time to access the Seattle fileserver (identical to the one in Boise) and sometimes they could not send emails or access websites. Most afternoons, the problem would clear up by itself. Pings to the DSL router (Flowpoint 2200, 64.113.213.13) showed no dropped packets at all. Pings to the LEAF router outside address (64.113.213.14) would drop 3% to 5% in the mornings, and 0% to 5% in the afternoons. Pings to the inside network would drop 10% to 60% in the mornings, and 0% to 10% in the afternoons. Within a few days the problem worsened, with as much as 85% dropped packets to the inside addresses in the mornings, but still clearing up most days by afternoon. On the weekend, the problem all but disappeared but returned Monday morning.
I verified with the ISP (Transedge, great customer service, highly reccmmend) that there was not problem up to the DSL router. I had the Boise staff temporarily replace the LEAF router with a Win98 box set to the router outside address (64.113.213.14) and dropped no packets at all. We replaced all network cables attached to the routers. I immediately tested and shipped a replacement router to them. I talked them through setting the new router up, using the same CD and floppy from the old router, and had them ship the old router to me.
The problems did not go away, and got worse as the days passed. I received and tested the old router, and it worked fine. Head-scratching time.
I had the Boise staff shut off all networked devices except one of the printers. The problem did not go away. I had them pull the network cable from the switch to the LEAF router. Still dropped packets.
As of today, the network is virtually inaccessible from the outside. Pings to the DSL router are still fine:
ITPB:~ dale$ ping -c 10 64.113.213.13
PING 64.113.213.13 (64.113.213.13): 56 data bytes
64 bytes from 64.113.213.13: icmp_seq=0 ttl=238 time=83.287 ms
64 bytes from 64.113.213.13: icmp_seq=1 ttl=238 time=82.428 ms
64 bytes from 64.113.213.13: icmp_seq=2 ttl=238 time=82.916 ms
64 bytes from 64.113.213.13: icmp_seq=3 ttl=238 time=82.382 ms
64 bytes from 64.113.213.13: icmp_seq=4 ttl=238 time=83.119 ms
64 bytes from 64.113.213.13: icmp_seq=5 ttl=238 time=82.121 ms
64 bytes from 64.113.213.13: icmp_seq=6 ttl=238 time=84.343 ms
64 bytes from 64.113.213.13: icmp_seq=7 ttl=238 time=83.358 ms
64 bytes from 64.113.213.13: icmp_seq=8 ttl=238 time=81.6 ms
64 bytes from 64.113.213.13: icmp_seq=9 ttl=238 time=80.802 ms
--- 64.113.213.13 ping statistics --- 10 packets transmitted, 10 packets received, 0% packet loss round-trip min/avg/max = 80.802/82.635/84.343 ms
Pings to the LEAF router outside address drop the majority of packets; ITPB:~ dale$ ping -c 20 64.113.213.14 PING 64.113.213.14 (64.113.213.14): 56 data bytes 64 bytes from 64.113.213.14: icmp_seq=0 ttl=237 time=81.561 ms 64 bytes from 64.113.213.14: icmp_seq=9 ttl=237 time=82.785 ms 64 bytes from 64.113.213.14: icmp_seq=11 ttl=237 time=83.254 ms 64 bytes from 64.113.213.14: icmp_seq=15 ttl=237 time=83.496 ms 64 bytes from 64.113.213.14: icmp_seq=19 ttl=237 time=84.834 ms
--- 64.113.213.14 ping statistics --- 20 packets transmitted, 5 packets received, 75% packet loss round-trip min/avg/max = 81.561/83.186/84.834 ms
Pings to the inside LEAF router address (192.168.3.254) are never returned:
ITPB:~ dale$ ping -c 200 192.168.3.254
PING 192.168.3.254 (192.168.3.254): 56 data bytes
--- 192.168.3.254 ping statistics --- 200 packets transmitted, 0 packets received, 100% packet loss
nmap -sP 192.168.3.0/24 runs about two minutes (compared to the usual 10 - 15 seconds) and returns "0 hosts up."
As a consequence, I've lost the ability to ssh into the LEAF router or the Linux fileserver in Boise. I'm flying to Boise on Wednesday, but I really don't know what to look for as a solution:
1. I've never dropped a packet sent to the DSL router so the ISP appears to be blameless.
2. The Boise LEAF hardware has been replaced with a tested machine and it's been verified that there was nothing wrong with the one that was replaced.
3. I've never known a problem with LEAF software to survive a reboot.
4. The problem persists even with no client machines operating on the private side of the router.
I really don't know where to go from here. These machines were so easy to set up and they have worked so well that I have never had to troubleshoot them before. I know how to use ping and fping, and a bit about nmap (but not much). Mainly, I don't have any idea apart from a bad network cable, bad NIC in the router, virus or adware on the network, what could cause something like this in the first place, and all of those possibilities have been eliminated to my satisfaction.
Thanks in advance for any advice.
Dale Mirenda
------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl ------------------------------------------------------------------------ leaf-user mailing list: [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/leaf-user SR FAQ: http://leaf-project.org/pub/doc/docmanager/docid_1891.html