Sylwester S. Biernacki wrote:
Hello,
about a month ago I wrote I'm glad about em(4) driver which works
pretty well on few of my boxes. However I need to change my
opinion... after what I saw today in the lab:
> [ ... cut ... ]
I wanted to reply relevant sections but your message is quite long, so
excuse my lack of netiquette but I'll braindump.
First off all,
"net.inet.tcp.recvspace" and ".sendspace" has nothing to do in your test
scenario as soon as the box only routes packets. They're only related
with the TCP stack of the box and as no effect on forwarded packets.
You have nice numbers such as total bandwidth and payload size. It's not
hard to calculate the packet rate but it only matters on THE packet
rate. 50,000 packets/sec can easily kill an OpenBSD box even with MP
kernel, employing I/O APIC.
I don't know if the em(4) driver fully employs MAC chip's interrupt
coalesing capabilities but it's mostly imported from FreeBSD counterpart
which is contributed and maintained by Intel AFAIK.
Although I don't have any sound explanation for your "%100 idle but
still freezing" case, generally the problem is with the interrupt storm.
I would suggest increasing the interface queue length but it won't help
if it's freezing. Anyway, the sysctl is "net.inet.ip.ifq.maxlen".
Default value is 50. 250 seems like a safe value for many em(4)s. This
would decrease the collisions on the interface because of the queue
capacity exhaustion.
It's a complicated issue. It's not black and white. The problem resides
in the hardware, in the driver itself and in the kernel (the way of
handling packets, interrupt, network, etc.)
And unfortunately there's no simple solution for none of them.
Just adding polling support to the kernel won't make things outperform
magically. AFAIK, we do not have access to em(4) hardware/driver
developer's manuals. (Is it true Brad?). So it's really hard to make
driver better... Last but not least, Intel's cards are not that awesome.
There exists much cheaper but less bloated and good performing MAC
chipsets around. Many of them are custom designs for high end networking
gear, which are also powered by ASICs.
SysKonnect is promising and commercially available too. I remember good
comments about sk(4) from Henning.
Sorry but we have to accept the fact... We're trying to handle
exceptional I/O loads on machines which were not designed to handle
that much. Especially the x86 platform. Rumors about amd64 to handle
better I/O didn't hold for me. My tests show tiny regressions.
I had similar problems with many different setups and tried many things
such as making the NICs _share_ the PCI interrupt. UP and MP kernels,
ifq lengths, different chipsets... The only major regressions were from
MP kernel (employing I/O APIC) and using sensible ifq lengths.
Good luck in your quest. Let us know if you manage to make things better.