Re: [PATCH] Use BPF to distribute packet to different work thread.

Mikhail Isachenkov Mon, 21 Sep 2020 04:29:47 -0700

Hi Liu Qiao,

We've testing early version patch with same BPF code (on AWS cloud,without ADQ-capable cards) on relatively small payloads and found nosignificant difference. We'd like to retest it with large payload size;could you please elaborate a bit more how you did perform the test?I mean 'nginx -T' output, number of CPU cores and CPU model, testscripts, 1-megabyte file and any system tuning parameters. One ofcaveats was that wrk may produce significant client load and mosteffective way to distribute client load between CPU cores is running wrkin one thread via taskset.

Another big caveat that I've found during my test is the strangebehavior of this BPF code: when client and server runs on the sameserver, all requests was served by one nginx worker process. Did you tryto run wrk and nginx locally in your test?


Thanks in advance!

15.09.2020 05:08, Liu, Qiao пишет:

Below is 5 times test result compare, 112 threads, 10000 connections, 1M object 
http request. Seems P99 have great improvement, and Max is also reduced



                                    AVG          Stdev            Max        P99
                   test 1      1.32s        447.09ms     5.48s      2.82s
BPF           test 2      1.39s        513.8ms       9.42s      3.1s
                   test 3      1.4s          341.38ms     5.63s      2.55s
                   test 4      1.41s        407.45ms     6.96s      2.77s
                   test 5      1.29s        644.81ms     9.45s      3.74s
                  Average  1.362s      470.906ms   7.388s    2.996s

NonBPF   test 1      1.48s         916.88ms     9.44s       5.08s
                  test 2      1.43s         658.48ms     9.54s       3.92s
                  test 3      1.41s         650.38ms     8.63s       3.59s
                  test 4      1.29s         1010ms        10s           5.21s
                  test 5      1.31s         875.01ms     9.53s       4.39s
              Average     1.384s        822.15ms    9.428s    4.438s


Thanks
LQ
-----Original Message-----
From: nginx-devel <[email protected]> On Behalf Of Liu, Qiao
Sent: Monday, September 14, 2020 9:18 AM
To: [email protected]
Subject: RE: [PATCH] Use BPF to distribute packet to different work thread.

Hi, Maxim Dounin:
Thanks for your reply, this server is random selected, we just do BPF and 
no-BPF test, I think the latency based on server configuration, not related 
with BPF patch, also the NIC of the server is Mellanox, not ADQ capable 
hardware , we will do more test Thanks LQ

-----Original Message-----
From: nginx-devel <[email protected]> On Behalf Of Maxim Dounin
Sent: Monday, September 14, 2020 7:40 AM
To: [email protected]
Subject: Re: [PATCH] Use BPF to distribute packet to different work thread.

Hello!

On Fri, Sep 11, 2020 at 05:41:47AM +0000, Liu, Qiao wrote:

Hi, Vladimir Homutov:
The below is our WRK test result output with BPF enable

   112 threads and 10000 connections
   Thread Stats   Avg      Stdev     Max   +/- Stdev
     Latency   608.23ms  820.71ms  10.00s    87.48%
     Connect    16.52ms   54.53ms   1.99s    94.73%
     Delay     153.13ms  182.17ms   2.00s    90.74%
     Req/Sec   244.79    142.32     1.99k    68.40%
   Latency Distribution
   50.00%  293.50ms
   75.00%  778.33ms
   90.00%    1.61s
   99.00%    3.71s
   99.90%    7.03s
   99.99%    8.94s
   Connect Distribution
   50.00%    1.93ms
   75.00%    2.85ms
   90.00%   55.76ms
   99.00%  229.19ms
   99.90%  656.79ms
   99.99%    1.43s
   Delay Distribution
   50.00%  110.96ms
   75.00%  193.67ms
   90.00%  321.77ms
   99.00%  959.27ms
   99.90%    1.57s
   99.99%    1.91s
Compared with no BPF but enable reuseport as below

112 threads and 10000 connections
   Thread Stats   Avg      Stdev     Max   +/- Stdev
     Latency   680.50ms  943.69ms  10.00s    87.18%
     Connect    58.44ms  238.08ms   2.00s    94.58%
     Delay     158.84ms  256.28ms   2.00s    90.92%
     Req/Sec   244.51    151.00     1.41k    69.67%
   Latency Distribution
   50.00%  317.61ms
   75.00%  913.52ms
   90.00%    1.90s
   99.00%    4.30s
   99.90%    6.52s
   99.99%    8.80s
   Connect Distribution
   50.00%    1.88ms
   75.00%    2.21ms
   90.00%   55.94ms
   99.00%    1.45s
   99.90%    1.95s
   99.99%    2.00s
   Delay Distribution
   50.00%   73.01ms
   75.00%  190.40ms
   90.00%  387.01ms
   99.00%    1.34s
   99.90%    1.86s
   99.99%    1.99s


 From the above results, there shows almost 20% percent latency
reduction. P99 latency of BPF is 3.71s , but without BPF is 4.3s.


Thank you for the results.

Given that latency stdev is way higher than the average latency, I don't think the 
"20% percent latency reduction" observed is statistically significant.  Please 
try running several tests and use ministat(1) to check the results.

Also, the latency values look very high, and request rate very low.  What's on 
the server side?

--
Maxim Dounin
http://mdounin.ru/
_______________________________________________
nginx-devel mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx-devel
_______________________________________________
nginx-devel mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx-devel
_______________________________________________
nginx-devel mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx-devel


--
Best regards,
Mikhail Isachenkov
NGINX Professional Services
_______________________________________________
nginx-devel mailing list
[email protected]
http://mailman.nginx.org/mailman/listinfo/nginx-devel

Re: [PATCH] Use BPF to distribute packet to different work thread.

Reply via email to