Re: [networking-discuss] why just one cpu gets loaded so heavily with 4 x 1GigE card?

Somenath Bandyopadhyay Fri, 24 Oct 2008 10:53:28 -0700

> On 10/23/08 16:46, Somenath Bandyopadhyay wrote:
> > (Sorry for posting sol10 networking question here,
> let me know if I should post it elsewhere)
> 
> It might help to know which update of S10. Now that
> you have a reference 
> point, it might be interesting to see the results
> with SXCE.


with update 4, 
cat /etc/release
                        Solaris 10 8/07 s10x_u4wos_12b X86
           Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
                        Use is subject to license terms.
                            Assembled 16 August 2007

> 
> > Problem:  We don't see 4 x 1 GigE card producing 4
> GigE throughput.
> > 
> > our setup: two nodes (n1, n2) are back to back
> connected with 4 GigE NIC cards.
> > Each individual NIC can produce 100MBps throughput.
> 
> Is that theoretical, or have you seen with one NIC
> and connection that 
> you get 100MB/s?

No, I get 100MBps most of the time through any of the 4 nics at a time..
sometimes slightly less than 100MBps...95-100MBps I should say!

> 
> If not, is your window size limiting the throughput?

I have not checked this, but for single TCP connection I am getting full 
bandwidth
and memory is not a limiting factor (16 GB)!


> 
> On a dual socket, dual core system, x4100, I have run
> three e1000g 
> interfaces at wire speed (don't remember whether
> there was CPU left over 
> and if I had needed it whether I could have gotten
> more. That was with 
> S10 8/07, and 1 to 32MB window sizes. One TCP
> connection per NIC. 
> Workload was ttcp and then FTP. I was using three
> zones and IP Instances 
> and had one NIC per non-global zone.

I have one zone, I didn't change window size...will try that.
Will multiple zones make a performance difference?
I tried with nfs and then with rcp...both giving same result.

> 
> > n1 is the client and n2 is the server. n1 is trying
> to read data stored in n2's memory
> > without hitting disk.
> > 
> > If I run the same applciation (on all 4 NICs at the
> same time) then max I get is
> > 200MBps. With 2 NICs I get 150MBps.
> > 
> > I watched that cpu#6 is getting heavily loaded 100%
> and "ithr" in mpstat is very
> > high for cpu#6, see below a sample mpstat output.
> > 
> > n2>#mpstat 1 1000
> > CPU minf mjf xcal  intr ithr  csw icsw migr smtx
>  srw syscl  usr sys  wt idl
> 0    0   0 71414 42597  241 3660  111  728  188
>     0  1336    0  46   0  54
> 0   0 49839 45700    0 4228  100  635  149    0
>   906    0  40   0  60
> 2    0   0 67422 41955    0 1484   47  267  178    0
>   1243    0  43   0  57
> 3    0   0 60928 43176    0 1260   44  198  424    0
>   1061    0  43   0  57
> 4    0   0 27945 47010    3  552    8   63  187    0
>    571    1  29   0  70
> 5    0   0 29726 46722    1  626    7   73   63    0
>   515    0  27   0  73
> 6    0   0    0 52581 1872  387  114   10  344    0
>      8    0  99   0   1
> 0   0 48189 44176    0 1077   25  152  150    0
>   858    0  34   0  66
> 
> > on n1, processor#6 is loaded 60% , rest of the
> processors are below 50%.
> > These results I got through default system
> parameters.
> 
> CPU 6 is doing all interrupt processing. I don't know
> if it would make 
> any difference in your case, but creating a process
> set with CPU 6 in it 
> will prevent any user land processing from getting
> pinned by an 
> interrupt. 'psrset -c 6', for example. intrstat will
> show how much of 
> that CPU is in interrupt processing. I don't know if
> it is possible with 
> nge to spread interrupts.

looks like all processors are interrupted (look at the interrup count in mpstat)
and only interrupt thread is getting scheduled in cpu#6.

I will try that "prset -c 6" idea and check with intrstat.

> 
> > 
> > This happend with the mtu size 1500 with the
> broadcom GigE nic.
> > When I use mtu = 9000, then I get throughput close
> to 3.8Gbps.
> > cpu#6 is still >90% busy, ithr is still very high
> on cpu#6...only difference is
> > that other cpus are also busy (close to 90%).
> 
> This suggests interrupts are a gating factor.
> 
> > 
> > I tried changing some /etc/system parameters e.g.
> > *       distribute squeues among all cpus
> > *       do this when NICs are faster than CPUs
> >        set ip:ip_squeue_fanout=1
> > 
> > (this was not the case in our setup, we have
> 8x2.33Ghz processors vs
> > 4x1GigE NIC, still tried this)
> > 
> > *       if number of cpus far more than number of
> nics
> >        set ip:tcp_squeue_wput=1
> > (since this was the case, I tried this, without any
> improvement)
> > 
> > *       latency sensitive machines should set this
> to zero
> > *       default is: worker threads wait for 10ms
> > *       val=0 means no wait, serve immeditely
> >        ip:ip_squeue_wait=0
> > 
> > 
> > but without effect. Changing set
> ip:ip_squeue_fanout=1 fails the benchmark to run.
> > tcp connection works otherwise.
> > 
> > 1)
> > So, my question is why is cpu% so high in cpu#6
> only?
> > Though the problem is solved with jumbo frames for
> 2 machines, if we increase
> > number of nodes this scalability problem will be
> seen with 3,4,5 ...machines
> > (since cpu utilization is very high with current
> state).
> > 
> > Is there any kernel tunable I should try to
> distribute the load differently?
> > Are all TCP connections (and squeues) getting tied
> with processor #6?
> > Is there a way to distribute connections among
> other processors?
> 
> what is the output of 'echo "ip_soft_rings_cnt/X" |
> mdb -k'

this was default, 
[EMAIL PROTECTED] ftp]# echo "ip_soft_rings_cnt/X" | mdb -k
ip_soft_rings_cnt:
ip_soft_rings_cnt:              2 

i also tried:
[EMAIL PROTECTED] /]# echo "ip_soft_rings_cnt/X" | mdb -k
ip_soft_rings_cnt:
ip_soft_rings_cnt:              8 

without improvement.

> 
> have you tried in /etc/system
> set ip_squeue_soft_ring=1

haven't tried that yet, will try it!

> set ip:ip_soft_rings_cnt=8
> 
> http://www.solarisinternals.com/wiki/index.php/Network
> s has lots of 
> tips, although none for bge.

thanks a lot! 
I haven't found bge tunables documentation anywhere, 
I just tried tuning what is in /kernel/drv/bnx.conf

thanks, som.
> 
> Steffen
> 
> > 
> > 2) with 24 TCP connections established, I again
> see, only cpu 6 has some mblk's
> >      (and don't see for others)....I didn't capture
> the mblks for cpu #6 in this example though.
> > 
> > is there something wrong here, shouldn't each TCP
> connection have its own squeue?
> > 
> > [3]> ::squeue
> >             ADDR STATE CPU            FIRST
>             LAST           WORKER
> e199c0 02060   7 0000000000000000 0000000000000000
> fffffe8001139c80
> > ffffffff98e19a80 02060   6 0000000000000000
> 0000000000000000 fffffe8001133c80
> > ffffffff98e19b40 02060   5 0000000000000000
> 0000000000000000 fffffe80010dfc80
> > ffffffff98e19c00 02060   4 0000000000000000
> 0000000000000000 fffffe800108bc80
> > ffffffff98e19cc0 02060   3 0000000000000000
> 0000000000000000 fffffe8001037c80
> > ffffffff98e19d80 02060   2 0000000000000000
> 0000000000000000 fffffe8000fe3c80
> > ffffffff98e19e40 02060   1 0000000000000000
> 0000000000000000 fffffe80004ebc80
> > ffffffff98e19f00 02060   0 0000000000000000
> 0000000000000000 fffffe8000293c80
> > [3]> :c
> > 
> > thanks, som
> > ([EMAIL PROTECTED], ph: 650-527-1566)
> > --
> > This message posted from opensolaris.org
> > _______________________________________________
> > networking-discuss mailing list
> > [email protected]
> 
> _______________________________________________
> networking-discuss mailing list
> [email protected]
--
This message posted from opensolaris.org
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] why just one cpu gets loaded so heavily with 4 x 1GigE card?

Reply via email to