On 10/23/08 16:46, Somenath Bandyopadhyay wrote: > (Sorry for posting sol10 networking question here, let me know if I should > post it elsewhere)
It might help to know which update of S10. Now that you have a reference point, it might be interesting to see the results with SXCE. > Problem: We don't see 4 x 1 GigE card producing 4 GigE throughput. > > our setup: two nodes (n1, n2) are back to back connected with 4 GigE NIC > cards. > Each individual NIC can produce 100MBps throughput. Is that theoretical, or have you seen with one NIC and connection that you get 100MB/s? If not, is your window size limiting the throughput? On a dual socket, dual core system, x4100, I have run three e1000g interfaces at wire speed (don't remember whether there was CPU left over and if I had needed it whether I could have gotten more. That was with S10 8/07, and 1 to 32MB window sizes. One TCP connection per NIC. Workload was ttcp and then FTP. I was using three zones and IP Instances and had one NIC per non-global zone. > n1 is the client and n2 is the server. n1 is trying to read data stored in > n2's memory > without hitting disk. > > If I run the same applciation (on all 4 NICs at the same time) then max I get > is > 200MBps. With 2 NICs I get 150MBps. > > I watched that cpu#6 is getting heavily loaded 100% and "ithr" in mpstat is > very > high for cpu#6, see below a sample mpstat output. > > n2>#mpstat 1 1000 > CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl > 0 0 0 71414 42597 241 3660 111 728 188 0 1336 0 46 0 54 > 1 0 0 49839 45700 0 4228 100 635 149 0 906 0 40 0 60 > 2 0 0 67422 41955 0 1484 47 267 178 0 1243 0 43 0 57 > 3 0 0 60928 43176 0 1260 44 198 424 0 1061 0 43 0 57 > 4 0 0 27945 47010 3 552 8 63 187 0 571 1 29 0 70 > 5 0 0 29726 46722 1 626 7 73 63 0 515 0 27 0 73 > 6 0 0 0 52581 1872 387 114 10 344 0 8 0 99 0 1 > 7 0 0 48189 44176 0 1077 25 152 150 0 858 0 34 0 66 > > on n1, processor#6 is loaded 60% , rest of the processors are below 50%. > These results I got through default system parameters. CPU 6 is doing all interrupt processing. I don't know if it would make any difference in your case, but creating a process set with CPU 6 in it will prevent any user land processing from getting pinned by an interrupt. 'psrset -c 6', for example. intrstat will show how much of that CPU is in interrupt processing. I don't know if it is possible with nge to spread interrupts. > > This happend with the mtu size 1500 with the broadcom GigE nic. > When I use mtu = 9000, then I get throughput close to 3.8Gbps. > cpu#6 is still >90% busy, ithr is still very high on cpu#6...only difference > is > that other cpus are also busy (close to 90%). This suggests interrupts are a gating factor. > > I tried changing some /etc/system parameters e.g. > * distribute squeues among all cpus > * do this when NICs are faster than CPUs > set ip:ip_squeue_fanout=1 > > (this was not the case in our setup, we have 8x2.33Ghz processors vs > 4x1GigE NIC, still tried this) > > * if number of cpus far more than number of nics > set ip:tcp_squeue_wput=1 > (since this was the case, I tried this, without any improvement) > > * latency sensitive machines should set this to zero > * default is: worker threads wait for 10ms > * val=0 means no wait, serve immeditely > ip:ip_squeue_wait=0 > > > but without effect. Changing set ip:ip_squeue_fanout=1 fails the benchmark to > run. > tcp connection works otherwise. > > 1) > So, my question is why is cpu% so high in cpu#6 only? > Though the problem is solved with jumbo frames for 2 machines, if we increase > number of nodes this scalability problem will be seen with 3,4,5 ...machines > (since cpu utilization is very high with current state). > > Is there any kernel tunable I should try to distribute the load differently? > Are all TCP connections (and squeues) getting tied with processor #6? > Is there a way to distribute connections among other processors? what is the output of 'echo "ip_soft_rings_cnt/X" | mdb -k' have you tried in /etc/system set ip_squeue_soft_ring=1 set ip:ip_soft_rings_cnt=8 http://www.solarisinternals.com/wiki/index.php/Networks has lots of tips, although none for bge. Steffen > > 2) with 24 TCP connections established, I again see, only cpu 6 has some > mblk's > (and don't see for others)....I didn't capture the mblks for cpu #6 in > this example though. > > is there something wrong here, shouldn't each TCP connection have its own > squeue? > > [3]> ::squeue > ADDR STATE CPU FIRST LAST WORKER > ffffffff98e199c0 02060 7 0000000000000000 0000000000000000 fffffe8001139c80 > ffffffff98e19a80 02060 6 0000000000000000 0000000000000000 fffffe8001133c80 > ffffffff98e19b40 02060 5 0000000000000000 0000000000000000 fffffe80010dfc80 > ffffffff98e19c00 02060 4 0000000000000000 0000000000000000 fffffe800108bc80 > ffffffff98e19cc0 02060 3 0000000000000000 0000000000000000 fffffe8001037c80 > ffffffff98e19d80 02060 2 0000000000000000 0000000000000000 fffffe8000fe3c80 > ffffffff98e19e40 02060 1 0000000000000000 0000000000000000 fffffe80004ebc80 > ffffffff98e19f00 02060 0 0000000000000000 0000000000000000 fffffe8000293c80 > [3]> :c > > thanks, som > ([EMAIL PROTECTED], ph: 650-527-1566) > -- > This message posted from opensolaris.org > _______________________________________________ > networking-discuss mailing list > [email protected] _______________________________________________ networking-discuss mailing list [email protected]
