On 10/23/08 16:46, Somenath Bandyopadhyay wrote:
> (Sorry for posting sol10 networking question here, let me know if I should 
> post it elsewhere)

It might help to know which update of S10. Now that you have a reference 
point, it might be interesting to see the results with SXCE.

> Problem:  We don't see 4 x 1 GigE card producing 4 GigE throughput.
> 
> our setup: two nodes (n1, n2) are back to back connected with 4 GigE NIC 
> cards.
> Each individual NIC can produce 100MBps throughput.

Is that theoretical, or have you seen with one NIC and connection that 
you get 100MB/s?

If not, is your window size limiting the throughput?

On a dual socket, dual core system, x4100, I have run three e1000g 
interfaces at wire speed (don't remember whether there was CPU left over 
and if I had needed it whether I could have gotten more. That was with 
S10 8/07, and 1 to 32MB window sizes. One TCP connection per NIC. 
Workload was ttcp and then FTP. I was using three zones and IP Instances 
and had one NIC per non-global zone.

> n1 is the client and n2 is the server. n1 is trying to read data stored in 
> n2's memory
> without hitting disk.
> 
> If I run the same applciation (on all 4 NICs at the same time) then max I get 
> is
> 200MBps. With 2 NICs I get 150MBps.
> 
> I watched that cpu#6 is getting heavily loaded 100% and "ithr" in mpstat is 
> very
> high for cpu#6, see below a sample mpstat output.
> 
> n2>#mpstat 1 1000
> CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
>  0    0   0 71414 42597  241 3660  111  728  188    0  1336    0  46   0  54
>  1    0   0 49839 45700    0 4228  100  635  149    0   906    0  40   0  60
>  2    0   0 67422 41955    0 1484   47  267  178    0  1243    0  43   0  57
>  3    0   0 60928 43176    0 1260   44  198  424    0  1061    0  43   0  57
>  4    0   0 27945 47010    3  552    8   63  187    0   571    1  29   0  70
>  5    0   0 29726 46722    1  626    7   73   63    0   515    0  27   0  73
>  6    0   0    0 52581 1872  387  114   10  344    0     8    0  99   0   1
>  7    0   0 48189 44176    0 1077   25  152  150    0   858    0  34   0  66
> 
> on n1, processor#6 is loaded 60% , rest of the processors are below 50%.
> These results I got through default system parameters.

CPU 6 is doing all interrupt processing. I don't know if it would make 
any difference in your case, but creating a process set with CPU 6 in it 
will prevent any user land processing from getting pinned by an 
interrupt. 'psrset -c 6', for example. intrstat will show how much of 
that CPU is in interrupt processing. I don't know if it is possible with 
nge to spread interrupts.

> 
> This happend with the mtu size 1500 with the broadcom GigE nic.
> When I use mtu = 9000, then I get throughput close to 3.8Gbps.
> cpu#6 is still >90% busy, ithr is still very high on cpu#6...only difference 
> is
> that other cpus are also busy (close to 90%).

This suggests interrupts are a gating factor.

> 
> I tried changing some /etc/system parameters e.g.
> *       distribute squeues among all cpus
> *       do this when NICs are faster than CPUs
>        set ip:ip_squeue_fanout=1
> 
> (this was not the case in our setup, we have 8x2.33Ghz processors vs
> 4x1GigE NIC, still tried this)
> 
> *       if number of cpus far more than number of nics
>        set ip:tcp_squeue_wput=1
> (since this was the case, I tried this, without any improvement)
> 
> *       latency sensitive machines should set this to zero
> *       default is: worker threads wait for 10ms
> *       val=0 means no wait, serve immeditely
>        ip:ip_squeue_wait=0
> 
> 
> but without effect. Changing set ip:ip_squeue_fanout=1 fails the benchmark to 
> run.
> tcp connection works otherwise.
> 
> 1)
> So, my question is why is cpu% so high in cpu#6 only?
> Though the problem is solved with jumbo frames for 2 machines, if we increase
> number of nodes this scalability problem will be seen with 3,4,5 ...machines
> (since cpu utilization is very high with current state).
> 
> Is there any kernel tunable I should try to distribute the load differently?
> Are all TCP connections (and squeues) getting tied with processor #6?
> Is there a way to distribute connections among other processors?

what is the output of 'echo "ip_soft_rings_cnt/X" | mdb -k'

have you tried in /etc/system
set ip_squeue_soft_ring=1
set ip:ip_soft_rings_cnt=8

http://www.solarisinternals.com/wiki/index.php/Networks has lots of 
tips, although none for bge.

Steffen

> 
> 2) with 24 TCP connections established, I again see, only cpu 6 has some 
> mblk's
>      (and don't see for others)....I didn't capture the mblks for cpu #6 in 
> this example though.
> 
> is there something wrong here, shouldn't each TCP connection have its own 
> squeue?
> 
> [3]> ::squeue
>             ADDR STATE CPU            FIRST             LAST           WORKER
> ffffffff98e199c0 02060   7 0000000000000000 0000000000000000 fffffe8001139c80
> ffffffff98e19a80 02060   6 0000000000000000 0000000000000000 fffffe8001133c80
> ffffffff98e19b40 02060   5 0000000000000000 0000000000000000 fffffe80010dfc80
> ffffffff98e19c00 02060   4 0000000000000000 0000000000000000 fffffe800108bc80
> ffffffff98e19cc0 02060   3 0000000000000000 0000000000000000 fffffe8001037c80
> ffffffff98e19d80 02060   2 0000000000000000 0000000000000000 fffffe8000fe3c80
> ffffffff98e19e40 02060   1 0000000000000000 0000000000000000 fffffe80004ebc80
> ffffffff98e19f00 02060   0 0000000000000000 0000000000000000 fffffe8000293c80
> [3]> :c
> 
> thanks, som
> ([EMAIL PROTECTED], ph: 650-527-1566)
> --
> This message posted from opensolaris.org
> _______________________________________________
> networking-discuss mailing list
> [email protected]

_______________________________________________
networking-discuss mailing list
[email protected]

Reply via email to