On 7/31/19 6:32 PM, Zebediah Figura wrote:
On 7/31/19 8:22 PM, Zebediah Figura wrote:
On 7/31/19 7:45 PM, Thomas Gleixner wrote:
If I assume a maximum of 65 futexes which got mentioned in one of the
replies then this will allocate 7280 bytes alone for the futex_q array with a stock debian config which has no debug options enabled which would bloat
the struct. Adding the futex_wait_block array into the same allocation
becomes larger than 8K which already exceeds thelimit of SLUB kmem
caches and forces the whole thing into the page allocator directly.

This sucks.

Also I'm confused about the 64 maximum resulting in 65 futexes comment in
one of the mails.

Can you please explain what you are trying to do exatly on the user space
side?

The extra futex comes from the fact that there are a couple of, as it
were, out-of-band ways to wake up a thread on Windows. [Specifically, a
thread can enter an "alertable" wait in which case it will be woken up
by a request from another thread to execute an "asynchronous procedure
call".] It's easiest for us to just add another futex to the list in
that case.

To be clear, the 64/65 distinction is an implementation detail that's pretty much outside the scope of this discussion. I should have just said 65 directly. Sorry about that.


I'd also point out, for whatever it's worth, that while 64 is a hard
limit, real applications almost never go nearly that high. By far the
most common number of primitives to select on is one.
Performance-critical code never tends to wait on more than three. The
most I've ever seen is twelve.

If you'd like to see the user-side source, most of the relevant code is
at [1], in particular the functions __fsync_wait_objects() [line 712]
and do_single_wait [line 655]. Please feel free to ask for further
clarification.

[1]
https://github.com/ValveSoftware/wine/blob/proton_4.11/dlls/ntdll/fsync.c

In addition, here's an example of how I think it might be useful to expose it to apps at large in a way that's compatible with existing pthread mutexes:

https://github.com/Plagman/glibc/commit/3b01145fa25987f2f93e7eda7f3e7d0f2f77b290

This patch hasn't received nearly as much testing as the Wine fsync code path, but that functionality would provide more CPU-efficient ways for thread pool code to sleep in our game engine. We also use eventfd today.

For this, I think the expected upper bound for the per-op futex count would be in the same order of magnitude as the logical CPU count on the target machine, similar as the Wine use-case.

Thanks,
 - Pierre-Loup





Thanks,

    tglx




Reply via email to