Daniel Ouellet wrote:
I am not saying it's the same problem here, but it sure behave the exact same way. See if you have timeout in the logs or not from that hme driver. But without you doing more tests on your box, it will not be looked at before it's done for sure.

I really hope it help you never the less and give you some ideas to try. The best way to get help if to help yourself first and really try many things and then you have more valuable data to use and report with.

Thanks for the suggestions. I have tried quite a few things so far, and will continue to test. Since this is a production environment, I can only work on it off-hours. The problem is intermittent and not always easy to reproduce, so it will take some time. Last night I got it to hang several times, but later the same load tests running for hours were not triggering it. The host and switch are forced to 100Mbps/Full and I've installed new cables. I tried several other switches and ports, and no matter what I do, even if it doesn't hang I always get interface errors. With gem it was ierrs, with hme it was oerrs, and I even tried an fxp card from a Compaq server and with that I get both ierrs and oerrs, but only a fraction of a percent of total packets. Today so far I have 160 errors out of 30M packets along with some input and CRC errors on the Cisco switch port. I would think these counters should be 0 unless something is really wrong. But throughput seems fine.

So right now I'm focusing more on trying to eliminate the error counts while giving the system "time" to get to the point where it may hang. No idea if the two issues are related. I'm definitely not giving up or expecting someone else to do all the troubleshooting...I was just hoping that maybe someone either knew about a fix or could give me some ideas where to look. Now that I have some more ideas, I will continue trying new things and look for an answer.

A question about logging--why do I not get any log entries or console messages about these failures? Almost everyone else that had these kinds of problems had log messages. Is there a way to enable verbose logging in the network drivers? Or is there something I can do or capture when the failure is occurring that would help me see what is going on? Without that, I feel like I'm just guessing.

Bryan

Reply via email to