Re: [networking-discuss] why just one cpu gets loaded so heavily with 4 x 1GigE card?

Somenath Bandyopadhyay Mon, 27 Oct 2008 11:21:10 -0700

> On 10/24/08 14:09, Somenath Bandyopadhyay wrote:
> > with intrsta I see only cpu#6 is getting
> interrupted...
> > 
> > device |      cpu0 %tim      cpu1 %tim      cpu2
> %tim      cpu3 %tim      cpu4 %tim      cpu5 %tim
>      cpu6 %tim
> ----------+-------------------------------------------
> ------------------------------------------------------
> --------
> >        asy#0 |         0  0.0         0  0.0
> 0  0.0         0  0.0         0  0.0
>         0  0.0         0  0.0
> ata#0 |         0  0.0         0  0.0         0  0.0
> 0  0.0         0  0.0         0  0.0
>         0  0.0
> bnx#0 |         0  0.0         0  0.0         0  0.0
> 0  0.0         0  0.0         0  0.0
>       161  0.1
> bnx#1 |         0  0.0         0  0.0         0
> 0.0         0  0.0         0  0.0         0  0.0
>        161  0.0
> bnx#2 |         0  0.0         0  0.0         0
> 0.0         0  0.0         0  0.0         0  0.0
>        161  0.0
> bnx#3 |         0  0.0         0  0.0         0
> 0.0         0  0.0         0  0.0         0  0.0
>        161  0.0
> ehci#0 |         0  0.0         0  0.0         0
> 0.0         0  0.0         6  0.0         0  0.0
>          0  0.0
> mfi#0 |         0  0.0         0  0.0         0  0.0
> 0  0.0         0  0.0         0  0.0
>         0  0.0
> uhci#0 |         0  0.0         0  0.0         0  0.0
> 0  0.0         6  0.0         0  0.0
>         0  0.0
> uhci#1 |         0  0.0         0  0.0         0  0.0
> 0  0.0         0  0.0         1  0.0
>         0  0.0
> uhci#2 |         0  0.0         0  0.0         0  0.0
> 0  0.0         6  0.0         0  0.0
>         0  0.0
> h mpstat (watch intr and ithr) shows all cpus getting
> interrupted but interrupt threads running on cpu#6
> making it >90% busy!
> 
> It would be interesting to see the output under load.



 device |      cpu0 %tim      cpu1 %tim      cpu2 %tim      cpu3 %tim      cpu4 
%tim      cpu5 %tim      cpu6 %tim
-------------+---------------------------------------------------------------------------------------------------------
       ata#0 |         0  0.0         0  0.0         0  0.0         0  0.0      
   0  0.0         0  0.0         0  0.0
       bnx#0 |         0  0.0         0  0.0         0  0.0         0  0.0      
   0  0.0         0  0.0      9881  0.9
       bnx#1 |         0  0.0         0  0.0         0  0.0         0  0.0      
   0  0.0         0  0.0      9881 24.2
       bnx#2 |         0  0.0         0  0.0         0  0.0         0  0.0      
   0  0.0         0  0.0      9881 16.5
       bnx#3 |         0  0.0         0  0.0         0  0.0         0  0.0      
   0  0.0         0  0.0      9880 16.0
      ehci#0 |         0  0.0         0  0.0         0  0.0         0  0.0      
   5  0.0         0  0.0         0  0.0
       mfi#0 |         0  0.0         0  0.0         0  0.0         0  0.0      
   0  0.0         0  0.0         0  0.0
      uhci#0 |         0  0.0         0  0.0         0  0.0         0  0.0      
   5  0.0         0  0.0         0  0.0
      uhci#1 |         0  0.0         0  0.0         0  0.0         0  0.0      
   0  0.0         0  0.0         0  0.0
      uhci#2 |         0  0.0         0  0.0         0  0.0         0  0.0      
   5  0.0         0  0.0         0  0.0

      device |      cpu7 %tim
-------------+---------------
       ata#0 |         0  0.0
       bnx#0 |         0  0.0
       bnx#1 |         0  0.0
       bnx#2 |         0  0.0
       bnx#3 |         0  0.0
      ehci#0 |         0  0.0
       mfi#0 |         0  0.0
      uhci#0 |         0  0.0
      uhci#1 |         0  0.0
      uhci#2 |         0  0.0


CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0 40836 38922  240 3322   55  463  471    0   750    0  41   0  59
  1    0   0 17022 41573    0 2534   46  291 3337    0   320    0  61   0  39
  2    0   0 77679 32521    0 1575   66  236  717    0  1411    0  52   0  48
  3    0   0 67097 33733  100 1544   44  236  645    0  1139    1  49   0  50
  4    0   0 31077 38689    3  669   16   68  206    0   700    0  36   0  64
  5    0   0 25283 39414    1  569    7   50  147    0   474    0  31   0  69
  6    0   0 7104 49393 7600  153   22   17 1141    0   156    0  92   0   8
  7    0   0 32602 38395    0  706   13   66  483    0   582    0  36   0  64

this is the tuning I used in /etc/system, results are same with default tuning 
too!
        set ddi_msix_alloc_limit=0x8
        set ip_squeue_soft_ring=1
        set ip:ip_soft_rings_cnt=0x8
        set ip:ip_squeue_fanout=0x1

I am hoping that with some tuning I will be able to distribute interrupts amon 
other processors and get rid of this problem!

test is simple rcp test of reading 12 files over 4 nics (distributing load 
equally, all cached in memory)
(observation is almost same 1 file read per 1 nic, 2 files read per 1 nic 
etc...only difference i see is the cpu% increases with more interrupts on cpu 
6) 
i can send you the exact programs, if you want to see them.

[EMAIL PROTECTED] rcp]# ./rcp_test_over4nics_12files

real    0m26.218s
user    0m0.097s
sys     0m1.245s

real    0m26.399s
user    0m0.096s
sys     0m1.264s

real    0m26.447s
user    0m0.095s
sys     0m1.255s

real    0m40.875s
user    0m0.092s
sys     0m1.026s

real    0m41.034s
user    0m0.090s
sys     0m1.036s

real    0m41.163s
user    0m0.093s
sys     0m1.039s

real    0m41.255s
user    0m0.091s
sys     0m1.037s

real    0m41.274s
user    0m0.091s
sys     0m1.023s

real    0m41.302s
user    0m0.092s
sys     0m1.021s

real    0m41.545s
user    0m0.092s
sys     0m1.037s

real    0m41.593s
user    0m0.092s
sys     0m1.039s

real    0m41.613s
user    0m0.092s
sys     0m1.036s


> 
> I am not familar with bnx, so if the output above is
> under load,
> something doesn't match with the data below. The sum
> of 4x161 should be
> somewhat close to the number of interrupts in mpstat
> below.

I didn't do cut & paste at the correct/same time, but this time I 
stopped mpstat and intrstat almost at the same time.

> 
> I also understand that crosscalls also generate
> interrupts, so that 
> would explain why all CPUs have such high counts.
> Crosscalls suggest 
> cache invalidation, so I do wonder whether psets for
> the user processes 
> might help. 

so, is it happening this way that physically all cpu's are interrupted
and interrupt threads are getting scheduled only on one processor?
I checked in BIOS setting that all of the bnx controllers interrupt at IRQ6,
and they can't be changed to different values. There is also a BIOS setting
for "distributing interrupts" that didn't help either.
psets setting for user process didn't improve things.


>Can't do that for NFS workload. Is the
> below output for NFS 
> (since there is virtually no usr time)?

I tried with nfs with similar results, to eliminate nfs and rpc I am trying 
with rcp (see above)

> 
> You might want to tune via the bnx.conf file, looking
> at the tunables
> focusing on receive segments per interrupt. Also,
> maybe more buffers if
> you increase the segments per interrupt, so you don't
> run out.

i will try that and see if that makes any difference.
i didn't find any bnx documentation so far, any idea if there is
any kind of notes of bnx drivers anywhere?

> 
> I noticed that the comments for the default values
> and the actual values
> don't match. If you follow the suggestions from
> others to place a
> service call, you may want to include that as well.

i will do that, for now just hoping to distribute interrupts with correct 
tuning.

> 
> Steffen
> 
> PS. what type of system is this?

2.33Ghz , 8 CPU machines, 16GB memory
Dell Poweredge 2950.

from prtconf...
 "name='brand-string' type=string items=1
                    value='Intel(r) Xeon(r) CPU           E5345  @ 2.33GHz' "

thanks, som.

> 
> > CPU minf mjf xcal  intr ithr  csw icsw migr smtx
>  srw syscl  usr sys  wt idl
> 0    0   0 30585 39533  239 2961   43  376  332
>     0   586    0  38   0  62
> 1    0   0 36700 37833    0 2685   28  363  260    0
>   673    1  39   0  60
> 2    0   0 59668 33989  101 1162   11   67  638
>     0  1062    1  49   0  50
> 3    0   0 63068 33893    0 1156   12   44  376    0
>  1143    0  50   0  50
> 4    0   0 28862 38099    3  564    4   23  269
>     0   529    0  37   0  63
> 5    0   0 20677 39201    1  411    4   20  207    0
>   377    1  30   0  69
> 6    0   0 3480 47606 5717   67   14    7  513    0
>   119    0  92   0   8
> 7    0   0 51324 35368    0 1076    3   42  277
>     0   901    0  42   0  58
> 
> > This message posted from opensolaris.org
> > _______________________________________________
> > networking-discuss mailing list
> > [email protected]
> 
> 
> _______________________________________________
> networking-discuss mailing list
> [email protected]
--
This message posted from opensolaris.org
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] why just one cpu gets loaded so heavily with 4 x 1GigE card?

Reply via email to