Hi Willy,
My systems were out of rotation for some other tests so did not get
to this till now. I have pulled the latest bits just now and tested.
Regarding
maxconn, I simply kept maxconn in global/defaults to 1 million and have
this line in the backend section:
default-server maxconn 1000000
I have not seen the Queue/Max you mentioned earlier.
The FD time has gone down to zero, but the LB time has increased
about 50% from last time (7700 ns to 11600 ns, I am using 'balance
leastconn').
Test was run for 1 minute:
$ wrk -c 4800 -t 48 -d 60s http://www.flipkart.com/128
The results were for 32 threads, which is the same configuration I tested
with earlier. Both of these testing was done with threads pinning to NUMA-1
cores (cores 1, 3, 5, ..47), and irq's to NUMA-0 (0, 2, 4, ..46). However,
the
cpus recycles from 1-47 back to 1-15 for the thread pinning. So that may
explain the much higher lock numbers that I am seeing. When I changed
this to use all cpus (0-31), the LBPRM lock took 74339.117 ns per operation.
But performance dropped from 210K to 80K.
Overall, I am not at ease for threading, or will have to settle for 12
threads
for the 12 non-hyperthreaded cores for a single socket.
Inlining the output of locks for the case where all threads are pinned to
NUMA-1 cores (and hence 2 threads to same cores for some cores),
at the end of this mail.
Thanks,
- Krishna
Stats about Lock FD:
# write lock : 2
# write unlock: 2 (0)
# wait time for write : 0.001 msec
# wait time for write/lock: 302.000 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_RQ:
# write lock : 373317
# write unlock: 373317 (0)
# wait time for write : 341.875 msec
# wait time for write/lock: 915.775 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock TASK_WQ:
# write lock : 373432
# write unlock: 373432 (0)
# wait time for write : 491.524 msec
# wait time for write/lock: 1316.235 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LISTENER:
# write lock : 1248
# write unlock: 1248 (0)
# wait time for write : 0.295 msec
# wait time for write/lock: 236.341 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock PROXY:
# write lock : 12524202
# write unlock: 12524202 (0)
# wait time for write : 20979.972 msec
# wait time for write/lock: 1675.154 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock SERVER:
# write lock : 50100330
# write unlock: 50100330 (0)
# wait time for write : 76908.311 msec
# wait time for write/lock: 1535.086 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock LBPRM:
# write lock : 50096808
# write unlock: 50096808 (0)
# wait time for write : 584505.012 msec
# wait time for write/lock: 11667.510 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock BUF_WQ:
# write lock : 35653802
# write unlock: 35653802 (0)
# wait time for write : 80406.420 msec
# wait time for write/lock: 2255.199 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock STRMS:
# write lock : 9602
# write unlock: 9602 (0)
# wait time for write : 5.613 msec
# wait time for write/lock: 584.594 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
Stats about Lock VARS:
# write lock : 37596611
# write unlock: 37596611 (0)
# wait time for write : 2285.148 msec
# wait time for write/lock: 60.781 nsec
# read lock : 0
# read unlock : 0 (0)
# wait time for read : 0.000 msec
# wait time for read/lock : 0.000 nsec
On Mon, Oct 15, 2018 at 11:14 PM Willy Tarreau <[email protected]> wrote:
> Hi again,
>
> finally I got rid of the FD lock for single-threaded accesses (most of
> them), and based on Olivier's suggestion, I implemented a per-thread
> wait queue, and cache-aligned some list heads to avoid undesired cache
> line sharing. For me all of this combined resulted in a performance
> increase of 25% on a 12-threads workload. I'm interested in your test
> results, all of this is in the latest master.
>
> If you still see LBPRM a lot, I can send you the experimental patch
> to move the element inside the tree without unlinking/relinking it
> and we can see if that provides any benefit or not (I'm not convinced).
>
> Cheers,
> Willy
>