Hi all, unfortunately it took us quite a while to finish the new setup of our system, but for a couple of days we were now running tests again. The results are good -- except for one major problem (see below).
For the first tests, we've used 4 Quad-Port nxge 1 GBit NICs with each 1 port used for the cluster interconnect and set ddi_msix_alloc_limit to 8 - the largest allowed value (this is the discussed workaround until Crossbow will be available). 16 DMA channels on that card allow for a fanout over 4 cpus per port, so with 4 ports we have a fanout over 16 cpus. This is working pretty good and solves our current problems with the interrupt load on our 128-way server. (We were able to reduce the packet load by some further optimizations.) In the tests now we have the target configuration with 2 Dual-Port nxge 10 GBit NICs instead with each 1 port used, again giving us a fanout over 16 cpus (8 cpus per port). Basically, this configuration works very good as well, except for one major problem: The nxge driver accidentally decides to use cpu 0 as one of the cpus (among 7 others) for the interrupt handler. Since this cpu is always handling the clock interrupts, this cpu is now overloaded with interrupt processing. Although we already put cpu 0 into a processor set (so that nothing but interrupts are running on that cpu), it now reaches 100% sys load (mpstat). From the kstat interrupt statistics I've calculated the per-level interrupt time over a time of 5 minutes, which is: CPU 0 - Overall: 97.7% CPU 0 - Level 1: 15.9% CPU 0 - Level 6: 8.9% CPU 0 - Level 10: 71.9% The cpu 0 is already 71.9% busy with the clock interrupts. On top of that come some Level 1 interrupts (PCIe to PCI bridge driver (?)) and some Level 6 interrupts -- this is our nxge NIC. Since the clock interrupt has the highest priority, the nxge performance suffers -- we see bad packet latencies on the interconnect when cpu 0 becomes overloaded. So my question is: How can I restrict nxge (and the PCIe/PCIe bridge) to chose cpu 0 for their interrupts?? "psradm -i 0" won't help since this will also affect the clock interrupt (I want to make sure that no HW interrupts are running on the cpu that is handling the clock interrupt). So what I would need is something like telling the driver not to use cpu 0 (or better: not to use the cpu that the clock interrupt is using). Is there a solution for this problem in S10U4? Will there be a solition in Crossbow? Thanks a lot, Nick. This message posted from opensolaris.org