> On 10/23/08 16:46, Somenath Bandyopadhyay wrote:
> > (Sorry for posting sol10 networking question here,
> let me know if I should post it elsewhere)
>
> It might help to know which update of S10. Now that
> you have a reference
> point, it might be interesting to see the results
> with SXCE.
with update 4,
cat /etc/release
Solaris 10 8/07 s10x_u4wos_12b X86
Copyright 2007 Sun Microsystems, Inc. All Rights Reserved.
Use is subject to license terms.
Assembled 16 August 2007
>
> > Problem: We don't see 4 x 1 GigE card producing 4
> GigE throughput.
> >
> > our setup: two nodes (n1, n2) are back to back
> connected with 4 GigE NIC cards.
> > Each individual NIC can produce 100MBps throughput.
>
> Is that theoretical, or have you seen with one NIC
> and connection that
> you get 100MB/s?
No, I get 100MBps most of the time through any of the 4 nics at a time..
sometimes slightly less than 100MBps...95-100MBps I should say!
>
> If not, is your window size limiting the throughput?
I have not checked this, but for single TCP connection I am getting full
bandwidth
and memory is not a limiting factor (16 GB)!
>
> On a dual socket, dual core system, x4100, I have run
> three e1000g
> interfaces at wire speed (don't remember whether
> there was CPU left over
> and if I had needed it whether I could have gotten
> more. That was with
> S10 8/07, and 1 to 32MB window sizes. One TCP
> connection per NIC.
> Workload was ttcp and then FTP. I was using three
> zones and IP Instances
> and had one NIC per non-global zone.
I have one zone, I didn't change window size...will try that.
Will multiple zones make a performance difference?
I tried with nfs and then with rcp...both giving same result.
>
> > n1 is the client and n2 is the server. n1 is trying
> to read data stored in n2's memory
> > without hitting disk.
> >
> > If I run the same applciation (on all 4 NICs at the
> same time) then max I get is
> > 200MBps. With 2 NICs I get 150MBps.
> >
> > I watched that cpu#6 is getting heavily loaded 100%
> and "ithr" in mpstat is very
> > high for cpu#6, see below a sample mpstat output.
> >
> > n2>#mpstat 1 1000
> > CPU minf mjf xcal intr ithr csw icsw migr smtx
> srw syscl usr sys wt idl
> 0 0 0 71414 42597 241 3660 111 728 188
> 0 1336 0 46 0 54
> 0 0 49839 45700 0 4228 100 635 149 0
> 906 0 40 0 60
> 2 0 0 67422 41955 0 1484 47 267 178 0
> 1243 0 43 0 57
> 3 0 0 60928 43176 0 1260 44 198 424 0
> 1061 0 43 0 57
> 4 0 0 27945 47010 3 552 8 63 187 0
> 571 1 29 0 70
> 5 0 0 29726 46722 1 626 7 73 63 0
> 515 0 27 0 73
> 6 0 0 0 52581 1872 387 114 10 344 0
> 8 0 99 0 1
> 0 0 48189 44176 0 1077 25 152 150 0
> 858 0 34 0 66
>
> > on n1, processor#6 is loaded 60% , rest of the
> processors are below 50%.
> > These results I got through default system
> parameters.
>
> CPU 6 is doing all interrupt processing. I don't know
> if it would make
> any difference in your case, but creating a process
> set with CPU 6 in it
> will prevent any user land processing from getting
> pinned by an
> interrupt. 'psrset -c 6', for example. intrstat will
> show how much of
> that CPU is in interrupt processing. I don't know if
> it is possible with
> nge to spread interrupts.
looks like all processors are interrupted (look at the interrup count in mpstat)
and only interrupt thread is getting scheduled in cpu#6.
I will try that "prset -c 6" idea and check with intrstat.
>
> >
> > This happend with the mtu size 1500 with the
> broadcom GigE nic.
> > When I use mtu = 9000, then I get throughput close
> to 3.8Gbps.
> > cpu#6 is still >90% busy, ithr is still very high
> on cpu#6...only difference is
> > that other cpus are also busy (close to 90%).
>
> This suggests interrupts are a gating factor.
>
> >
> > I tried changing some /etc/system parameters e.g.
> > * distribute squeues among all cpus
> > * do this when NICs are faster than CPUs
> > set ip:ip_squeue_fanout=1
> >
> > (this was not the case in our setup, we have
> 8x2.33Ghz processors vs
> > 4x1GigE NIC, still tried this)
> >
> > * if number of cpus far more than number of
> nics
> > set ip:tcp_squeue_wput=1
> > (since this was the case, I tried this, without any
> improvement)
> >
> > * latency sensitive machines should set this
> to zero
> > * default is: worker threads wait for 10ms
> > * val=0 means no wait, serve immeditely
> > ip:ip_squeue_wait=0
> >
> >
> > but without effect. Changing set
> ip:ip_squeue_fanout=1 fails the benchmark to run.
> > tcp connection works otherwise.
> >
> > 1)
> > So, my question is why is cpu% so high in cpu#6
> only?
> > Though the problem is solved with jumbo frames for
> 2 machines, if we increase
> > number of nodes this scalability problem will be
> seen with 3,4,5 ...machines
> > (since cpu utilization is very high with current
> state).
> >
> > Is there any kernel tunable I should try to
> distribute the load differently?
> > Are all TCP connections (and squeues) getting tied
> with processor #6?
> > Is there a way to distribute connections among
> other processors?
>
> what is the output of 'echo "ip_soft_rings_cnt/X" |
> mdb -k'
this was default,
[EMAIL PROTECTED] ftp]# echo "ip_soft_rings_cnt/X" | mdb -k
ip_soft_rings_cnt:
ip_soft_rings_cnt: 2
i also tried:
[EMAIL PROTECTED] /]# echo "ip_soft_rings_cnt/X" | mdb -k
ip_soft_rings_cnt:
ip_soft_rings_cnt: 8
without improvement.
>
> have you tried in /etc/system
> set ip_squeue_soft_ring=1
haven't tried that yet, will try it!
> set ip:ip_soft_rings_cnt=8
>
> http://www.solarisinternals.com/wiki/index.php/Network
> s has lots of
> tips, although none for bge.
thanks a lot!
I haven't found bge tunables documentation anywhere,
I just tried tuning what is in /kernel/drv/bnx.conf
thanks, som.
>
> Steffen
>
> >
> > 2) with 24 TCP connections established, I again
> see, only cpu 6 has some mblk's
> > (and don't see for others)....I didn't capture
> the mblks for cpu #6 in this example though.
> >
> > is there something wrong here, shouldn't each TCP
> connection have its own squeue?
> >
> > [3]> ::squeue
> > ADDR STATE CPU FIRST
> LAST WORKER
> e199c0 02060 7 0000000000000000 0000000000000000
> fffffe8001139c80
> > ffffffff98e19a80 02060 6 0000000000000000
> 0000000000000000 fffffe8001133c80
> > ffffffff98e19b40 02060 5 0000000000000000
> 0000000000000000 fffffe80010dfc80
> > ffffffff98e19c00 02060 4 0000000000000000
> 0000000000000000 fffffe800108bc80
> > ffffffff98e19cc0 02060 3 0000000000000000
> 0000000000000000 fffffe8001037c80
> > ffffffff98e19d80 02060 2 0000000000000000
> 0000000000000000 fffffe8000fe3c80
> > ffffffff98e19e40 02060 1 0000000000000000
> 0000000000000000 fffffe80004ebc80
> > ffffffff98e19f00 02060 0 0000000000000000
> 0000000000000000 fffffe8000293c80
> > [3]> :c
> >
> > thanks, som
> > ([EMAIL PROTECTED], ph: 650-527-1566)
> > --
> > This message posted from opensolaris.org
> > _______________________________________________
> > networking-discuss mailing list
> > [email protected]
>
> _______________________________________________
> networking-discuss mailing list
> [email protected]
--
This message posted from opensolaris.org
_______________________________________________
networking-discuss mailing list
[email protected]