Re: [PATCH v2] net: Do not enable tx-nocache-copy by default

2014-01-07 Thread David Miller
From: Benjamin Poirier 
Date: Tue,  7 Jan 2014 10:11:10 -0500

> There are many cases where this feature does not improve performance or even
> reduces it.
> 
> For example, here are the results from tests that I've run using 3.12.6 on one
> Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
> from the Xeon, but they're similar on the i7. All numbers report the
> mean±stddev over 10 runs of 10s.
> 
> 1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache
> copy from user on transmit"
> There is no statistically significant difference between tx-nocache-copy
> on/off.
> nic irqs spread out (one queue per cpu)
 ...
> CC: Tom Herbert 
> Signed-off-by: Benjamin Poirier 

Looks good, applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] net: Do not enable tx-nocache-copy by default

2014-01-07 Thread Benjamin Poirier
There are many cases where this feature does not improve performance or even
reduces it.

For example, here are the results from tests that I've run using 3.12.6 on one
Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
from the Xeon, but they're similar on the i7. All numbers report the
mean±stddev over 10 runs of 10s.

1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache
copy from user on transmit"
There is no statistically significant difference between tx-nocache-copy
on/off.
nic irqs spread out (one queue per cpu)

200x netperf -r 1400,1
tx-nocache-copy off
692000±1000 tps
50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
tx-nocache-copy on
693000±1000 tps
50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7

200x netperf -r 14000,14000
tx-nocache-copy off
86450±80 tps
50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
tx-nocache-copy on
86110±60 tps
50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20

2) single stream throughput tests
tx-nocache-copy leads to higher service demand

throughput  cpu0cpu1demand
(Gb/s)  (Gcycle)(Gcycle)(cycle/B)

nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)

tx-nocache-copy off 9402±5  9.4±0.2 0.80±0.01
tx-nocache-copy on  9403±3  9.85±0.04   0.838±0.004

nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)

tx-nocache-copy off 9401±5  5.83±0.03   5.0±0.1 0.923±0.007
tx-nocache-copy on  9404±2  5.74±0.03   5.523±0.009 0.958±0.002

As a second example, here are some results from Eric Dumazet with latest
net-next.
tx-nocache-copy also leads to higher service demand

(cpu is Intel(R) Xeon(R) CPU X5660  @ 2.80GHz)

lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
lpq84.prod.google.com () port 0 AF_INET
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % U  us/KB   us/KB

 87380  16384  1638410.00  9407.44   2.50 -1.000.522   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

   4282.648396 task-clock#0.423 CPUs utilized
 9,348 context-switches  #0.002 M/sec
88 CPU-migrations#0.021 K/sec
   355 page-faults   #0.083 K/sec
11,812,797,651 cycles#2.758 GHz 
[82.79%]
 9,020,522,817 stalled-cycles-frontend   #   76.36% frontend cycles idle
[82.54%]
 4,579,889,681 stalled-cycles-backend#   38.77% backend  cycles idle
[67.33%]
 6,053,172,792 instructions  #0.51  insns per cycle
 #1.49  stalled cycles per insn 
[83.64%]
   597,275,583 branches  #  139.464 M/sec   
[83.70%]
 8,960,541 branch-misses #1.50% of all branches 
[83.65%]

  10.128990264 seconds time elapsed

lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
lpq84.prod.google.com () port 0 AF_INET
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % U  us/KB   us/KB

 87380  16384  1638410.00  9412.45   2.15 -1.000.449   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

   2847.375441 task-clock#0.281 CPUs utilized
11,632 context-switches  #0.004 M/sec
49 CPU-migrations#0.017 K/sec
   354 page-faults   #0.124 K/sec
 7,646,889,749 cycles#2.686 GHz 
[83.34%]
 6,115,050,032 stalled-cycles-frontend   #   79.97% frontend cycles idle
[83.31%]
 1,726,460,071 stalled-cycles-backend#   22.58% backend  cycles idle
[66.55%]
 2,079,702,453 instructions  #0.27  insns per cycle
 #2.94  stalled cycles per insn 
[83.22%]
   363,773,213 branches  #  127.757 M/sec   
[83.29%]
 4,242,732 branch-misses #1.17% of all branches 
[83.51%]

  10.128449949 seconds time 

[PATCH v2] net: Do not enable tx-nocache-copy by default

2014-01-07 Thread Benjamin Poirier
There are many cases where this feature does not improve performance or even
reduces it.

For example, here are the results from tests that I've run using 3.12.6 on one
Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
from the Xeon, but they're similar on the i7. All numbers report the
mean±stddev over 10 runs of 10s.

1) latency tests similar to what is described in c6e1a0d net: Allow no-cache
copy from user on transmit
There is no statistically significant difference between tx-nocache-copy
on/off.
nic irqs spread out (one queue per cpu)

200x netperf -r 1400,1
tx-nocache-copy off
692000±1000 tps
50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
tx-nocache-copy on
693000±1000 tps
50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7

200x netperf -r 14000,14000
tx-nocache-copy off
86450±80 tps
50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
tx-nocache-copy on
86110±60 tps
50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20

2) single stream throughput tests
tx-nocache-copy leads to higher service demand

throughput  cpu0cpu1demand
(Gb/s)  (Gcycle)(Gcycle)(cycle/B)

nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)

tx-nocache-copy off 9402±5  9.4±0.2 0.80±0.01
tx-nocache-copy on  9403±3  9.85±0.04   0.838±0.004

nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)

tx-nocache-copy off 9401±5  5.83±0.03   5.0±0.1 0.923±0.007
tx-nocache-copy on  9404±2  5.74±0.03   5.523±0.009 0.958±0.002

As a second example, here are some results from Eric Dumazet with latest
net-next.
tx-nocache-copy also leads to higher service demand

(cpu is Intel(R) Xeon(R) CPU X5660  @ 2.80GHz)

lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
lpq84.prod.google.com () port 0 AF_INET
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % U  us/KB   us/KB

 87380  16384  1638410.00  9407.44   2.50 -1.000.522   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

   4282.648396 task-clock#0.423 CPUs utilized
 9,348 context-switches  #0.002 M/sec
88 CPU-migrations#0.021 K/sec
   355 page-faults   #0.083 K/sec
11,812,797,651 cycles#2.758 GHz 
[82.79%]
 9,020,522,817 stalled-cycles-frontend   #   76.36% frontend cycles idle
[82.54%]
 4,579,889,681 stalled-cycles-backend#   38.77% backend  cycles idle
[67.33%]
 6,053,172,792 instructions  #0.51  insns per cycle
 #1.49  stalled cycles per insn 
[83.64%]
   597,275,583 branches  #  139.464 M/sec   
[83.70%]
 8,960,541 branch-misses #1.50% of all branches 
[83.65%]

  10.128990264 seconds time elapsed

lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
lpq84.prod.google.com () port 0 AF_INET
Recv   SendSend  Utilization   Service Demand
Socket Socket  Message  Elapsed  Send Recv SendRecv
Size   SizeSize Time Throughput  localremote   local   remote
bytes  bytes   bytessecs.10^6bits/s  % S  % U  us/KB   us/KB

 87380  16384  1638410.00  9412.45   2.15 -1.000.449   -1.000

 Performance counter stats for './netperf -H lpq84 -c':

   2847.375441 task-clock#0.281 CPUs utilized
11,632 context-switches  #0.004 M/sec
49 CPU-migrations#0.017 K/sec
   354 page-faults   #0.124 K/sec
 7,646,889,749 cycles#2.686 GHz 
[83.34%]
 6,115,050,032 stalled-cycles-frontend   #   79.97% frontend cycles idle
[83.31%]
 1,726,460,071 stalled-cycles-backend#   22.58% backend  cycles idle
[66.55%]
 2,079,702,453 instructions  #0.27  insns per cycle
 #2.94  stalled cycles per insn 
[83.22%]
   363,773,213 branches  #  127.757 M/sec   
[83.29%]
 4,242,732 branch-misses #1.17% of all branches 
[83.51%]

  10.128449949 seconds time 

Re: [PATCH v2] net: Do not enable tx-nocache-copy by default

2014-01-07 Thread David Miller
From: Benjamin Poirier bpoir...@suse.de
Date: Tue,  7 Jan 2014 10:11:10 -0500

 There are many cases where this feature does not improve performance or even
 reduces it.
 
 For example, here are the results from tests that I've run using 3.12.6 on one
 Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
 from the Xeon, but they're similar on the i7. All numbers report the
 mean±stddev over 10 runs of 10s.
 
 1) latency tests similar to what is described in c6e1a0d net: Allow no-cache
 copy from user on transmit
 There is no statistically significant difference between tx-nocache-copy
 on/off.
 nic irqs spread out (one queue per cpu)
 ...
 CC: Tom Herbert therb...@google.com
 Signed-off-by: Benjamin Poirier bpoir...@suse.de

Looks good, applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/