Sylwester S. Biernacki wrote:
Hello,

  about a month ago I wrote I'm glad about em(4) driver which works
  pretty well on few of my boxes. However I need to change my
  opinion... after what I saw today in the lab:
> [ ... cut ... ]

I wanted to reply relevant sections but your message is quite long, so excuse my lack of netiquette but I'll braindump.

First off all,
"net.inet.tcp.recvspace" and ".sendspace" has nothing to do in your test scenario as soon as the box only routes packets. They're only related with the TCP stack of the box and as no effect on forwarded packets.

You have nice numbers such as total bandwidth and payload size. It's not hard to calculate the packet rate but it only matters on THE packet rate. 50,000 packets/sec can easily kill an OpenBSD box even with MP kernel, employing I/O APIC.

I don't know if the em(4) driver fully employs MAC chip's interrupt coalesing capabilities but it's mostly imported from FreeBSD counterpart which is contributed and maintained by Intel AFAIK.

Although I don't have any sound explanation for your "%100 idle but still freezing" case, generally the problem is with the interrupt storm.

I would suggest increasing the interface queue length but it won't help if it's freezing. Anyway, the sysctl is "net.inet.ip.ifq.maxlen". Default value is 50. 250 seems like a safe value for many em(4)s. This would decrease the collisions on the interface because of the queue capacity exhaustion.

It's a complicated issue. It's not black and white. The problem resides in the hardware, in the driver itself and in the kernel (the way of handling packets, interrupt, network, etc.)
And unfortunately there's no simple solution for none of them.
Just adding polling support to the kernel won't make things outperform magically. AFAIK, we do not have access to em(4) hardware/driver developer's manuals. (Is it true Brad?). So it's really hard to make driver better... Last but not least, Intel's cards are not that awesome. There exists much cheaper but less bloated and good performing MAC chipsets around. Many of them are custom designs for high end networking gear, which are also powered by ASICs. SysKonnect is promising and commercially available too. I remember good comments about sk(4) from Henning.

Sorry but we have to accept the fact... We're trying to handle exceptional I/O loads on machines which were not designed to handle that much. Especially the x86 platform. Rumors about amd64 to handle better I/O didn't hold for me. My tests show tiny regressions.

I had similar problems with many different setups and tried many things such as making the NICs _share_ the PCI interrupt. UP and MP kernels, ifq lengths, different chipsets... The only major regressions were from MP kernel (employing I/O APIC) and using sensible ifq lengths.

Good luck in your quest. Let us know if you manage to make things better.

Reply via email to