Hi again, finally I got rid of the FD lock for single-threaded accesses (most of them), and based on Olivier's suggestion, I implemented a per-thread wait queue, and cache-aligned some list heads to avoid undesired cache line sharing. For me all of this combined resulted in a performance increase of 25% on a 12-threads workload. I'm interested in your test results, all of this is in the latest master.
If you still see LBPRM a lot, I can send you the experimental patch to move the element inside the tree without unlinking/relinking it and we can see if that provides any benefit or not (I'm not convinced). Cheers, Willy