Did you try the mp kernel to see if that makes a difference for you.
Out of curiosity, what effect would this have on a single CPU box?

Using a different kernel with different options compile in it.

For me at the time the MP kernel didn't have the problem that the sp had and looking the difference in between them pointed out to look in one direction to address the patch at the time.

That's why I asked if you tried it.

The bottom line is MP kernel does wok on single core processor. It's just like having a CPU with one core only really. There is nothing wrong trying it, it will not kill your box. (;>

Also, don't forget that the fix here is not in 4.5, but pass 4.5

And anything in your logs for timeout message may be?

And 4.6 is really around the corner now. Might be best to run it and see.
I know the fix for gem is in 4.6, but does the same problem affect hme? Since I'm having the problem with both drivers, I'm not sure if the 4.6 fix is related to the problem I'm seeing. Unlike your experience, I'm not getting any error messages in any logs or on the console. The only clue is the ierrs/oerrs and some error counts on the switch.

There might be the same type of watch dog issue in the hme that it was on the gem. I can't tell you for sure, but the bottom line here as well if you really want to find a problem or possibly a bug like it's explain n the FaQ, you need to try the latet snapshot first and report if that still have your problem with it or not. There is so many changes lately in it. Your problem may well be gone, or still present, however you need to help yourself and try to find more and the start of it is to try all you can, witch you still haven't done it. Don't forget, you are the one with the problem, not the dev, but you would like them to look into it. Start by providing valuable details and may be if one have time, or an idea it he/she might look into it. But you need to provide more details first and at a minimum try to isolate it. Many tests do not need to be a programmer to do them and provide valuable details. For all everyone knows, the problem may well be fix by now, or not.

I was able to kill the interface several times by pushing data through the firewall (into hme0 and out hme1) at around 70Mbps for 5-10 minutes. Same result--hme1 stopped responding but I could ping hosts on the hme0 side. I'm fairly sure (it was a long night...) that one time I did the ifconfig down/up on *hme0* and that revived hme1, which seemed odd.

I am not saying it's the same problem here, but it sure behave the exact same way. See if you have timeout in the logs or not from that hme driver. But without you doing more tests on your box, it will not be looked at before it's done for sure.

I ran "systat ifstat" during the failure, and it showed data flowing inbound through the firewall into hme0 and out hme1, but nothing in the other direction. So hme1 seems to be half working. Not sure if it matters, but I'm using altq with hfsc.

May be an auto duplex negotiation issue, or not. But did you try and see if that might help or even make a difference? Just try to think or all possibility and tests some. Like different switch, or fix the port speed on the switch and hme card just to test. Try MOP kernel, try snapshot ( and if you do, don't forget that changes were done in PF that may affect you and need changes to the PF configuration in 4.6) Then and only then will you have more data to report and may be look into what might be the issue.

Hope this help you some and provide you some tests that really out to be done to be helpful.

Just think about it as it is now. You report an issue, but it would be much more helpful if thee is a case that remove the issue and then compare between the two setup could be looked at. For all we know now it may just be a switch port issue really. I am not saying it is, but could be as that's the same element in the picture as before on one end of it.

I know you have that for many weeks now based on your previous email, so you try to isolate it, witch is good, but then go all the way to find it and really try more stuff then what you do now. You may fix it real quick doing so and wonder why you didn't do it sooner after that fact.

I really hope it help you never the less and give you some ideas to try. The best way to get help if to help yourself first and really try many things and then you have more valuable data to use and report with.

Best,

Daniel

Reply via email to