Hi all,

unfortunately it took us quite a while to finish the new setup of our system, 
but for a couple of days we were now running tests again. The results are good 
-- except for one major problem (see below).

For the first tests, we've used 4 Quad-Port nxge 1 GBit NICs with each 1 port 
used for the cluster interconnect and set ddi_msix_alloc_limit to 8 - the 
largest allowed value (this is the discussed workaround until Crossbow will be 
available). 16 DMA channels on that card allow for a fanout over 4 cpus per 
port, so with 4 ports we have a fanout over 16 cpus. This is working pretty 
good and solves our current problems with the interrupt load on our 128-way 
server. (We were able to reduce the packet load by some further optimizations.)

In the tests now we have the target configuration with 2 Dual-Port nxge 10 GBit 
NICs instead with each 1 port used, again giving us a fanout over 16 cpus (8 
cpus per port). Basically, this configuration works very good as well, except 
for one major problem:

The nxge driver accidentally decides to use cpu 0 as one of the cpus (among 7 
others) for the interrupt handler. Since this cpu is always handling the clock 
interrupts, this cpu is now overloaded with interrupt processing. Although we 
already put cpu 0 into a processor set (so that nothing but interrupts are 
running on that cpu), it now reaches 100% sys load (mpstat). From the kstat 
interrupt statistics I've calculated the per-level interrupt time over a time 
of 5 minutes, which is:
CPU   0 - Overall: 97.7%
CPU   0 - Level  1: 15.9%
CPU   0 - Level  6: 8.9%
CPU   0 - Level 10: 71.9%

The cpu 0 is already 71.9% busy with the clock interrupts. On top of that come 
some Level 1 interrupts (PCIe to PCI bridge driver (?)) and some Level 6 
interrupts -- this is our nxge NIC. Since the clock interrupt has the highest 
priority, the nxge performance suffers -- we see bad packet latencies on the 
interconnect when cpu 0 becomes overloaded.

So my question is: How can I restrict nxge (and the PCIe/PCIe bridge) to chose 
cpu 0 for their interrupts?? "psradm -i 0" won't help since this will also 
affect the clock interrupt (I want to make sure that no HW interrupts are 
running on the cpu that is handling the clock interrupt). So what I would need 
is something like telling the driver not to use cpu 0 (or better: not to use 
the cpu that the clock interrupt is using).

Is there a solution for this problem in S10U4?
Will there be a solition in Crossbow?

Thanks a lot,
Nick.
 
 
This message posted from opensolaris.org

Reply via email to