Hi Willy,

My systems were out of rotation for some other tests so did not get
to this till now. I have pulled the latest bits just now and tested.
Regarding
maxconn, I simply kept maxconn in global/defaults to 1 million and have
this line in the backend section:
    default-server maxconn 1000000
I have not seen the Queue/Max you mentioned earlier.

The FD time has gone down to zero, but the LB time has increased
about 50% from last time (7700 ns to 11600 ns, I am using 'balance
leastconn').
Test was run for 1 minute:
    $ wrk -c 4800 -t 48 -d 60s http://www.flipkart.com/128

The results were for 32 threads, which is the same configuration I tested
with earlier. Both of these testing was done with threads pinning to NUMA-1
cores (cores 1, 3, 5, ..47), and irq's to NUMA-0 (0, 2, 4, ..46). However,
the
cpus recycles from 1-47 back to 1-15 for the thread pinning. So that may
explain the much higher lock numbers that I am seeing. When I changed
this to use all cpus (0-31), the LBPRM lock took 74339.117 ns per operation.
But performance dropped from 210K to 80K.

Overall, I am not at ease for threading, or will have to settle for 12
threads
for the 12 non-hyperthreaded cores for a single socket.

Inlining the output of locks for the case where all threads are pinned to
NUMA-1 cores (and hence 2 threads to same cores for some cores),
at the end of this mail.

Thanks,
- Krishna

Stats about Lock FD:
# write lock  : 2
# write unlock: 2 (0)
# wait time for write     : 0.001 msec
# wait time for write/lock: 302.000 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_RQ:
# write lock  : 373317
# write unlock: 373317 (0)
# wait time for write     : 341.875 msec
# wait time for write/lock: 915.775 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_WQ:
# write lock  : 373432
# write unlock: 373432 (0)
# wait time for write     : 491.524 msec
# wait time for write/lock: 1316.235 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LISTENER:
# write lock  : 1248
# write unlock: 1248 (0)
# wait time for write     : 0.295 msec
# wait time for write/lock: 236.341 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock PROXY:
# write lock  : 12524202
# write unlock: 12524202 (0)
# wait time for write     : 20979.972 msec
# wait time for write/lock: 1675.154 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock SERVER:
# write lock  : 50100330
# write unlock: 50100330 (0)
# wait time for write     : 76908.311 msec
# wait time for write/lock: 1535.086 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LBPRM:
# write lock  : 50096808
# write unlock: 50096808 (0)
# wait time for write     : 584505.012 msec
# wait time for write/lock: 11667.510 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock BUF_WQ:
# write lock  : 35653802
# write unlock: 35653802 (0)
# wait time for write     : 80406.420 msec
# wait time for write/lock: 2255.199 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock STRMS:
# write lock  : 9602
# write unlock: 9602 (0)
# wait time for write     : 5.613 msec
# wait time for write/lock: 584.594 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock VARS:
# write lock  : 37596611
# write unlock: 37596611 (0)
# wait time for write     : 2285.148 msec
# wait time for write/lock: 60.781 nsec
# read lock   : 0
# read unlock : 0 (0)
# wait time for read      : 0.000 msec
# wait time for read/lock : 0.000 nsec


On Mon, Oct 15, 2018 at 11:14 PM Willy Tarreau <[email protected]> wrote:

> Hi again,
>
> finally I got rid of the FD lock for single-threaded accesses (most of
> them), and based on Olivier's suggestion, I implemented a per-thread
> wait queue, and cache-aligned some list heads to avoid undesired cache
> line sharing. For me all of this combined resulted in a performance
> increase of 25% on a 12-threads workload. I'm interested in your test
> results, all of this is in the latest master.
>
> If you still see LBPRM a lot, I can send you the experimental patch
> to move the element inside the tree without unlinking/relinking it
> and we can see if that provides any benefit or not (I'm not convinced).
>
> Cheers,
> Willy
>

Reply via email to