Thank you for your answer.
I understand I can control the number of threads and prevent them to be
assigned to actual hardware threads.
Preventing oversubscription of the hardware threads is challenging when using
OpenMP/TBB/OpenSWR in hybrid environments.
I am wondering if having N SWR contexts (where N correspond to the number of
hardware threads) each single-threaded
is *good enough* (not too bad performances compared to a single SWR context
that serially render the tasks).
Do you have a take on this ?
That might be do the trick.
Similar oversubscription problems occur with all applications that use multiple
threading technologies (Cilk, TBB, OpenMP … ) and there are minimal solutions
to prevent it besides re-writing code to use only 1 tech.
An alternative solution would be to have a callback mechanism in OpenSWR to
launch a task on the application.
> On 16 May 2018, at 14:34, Cherniak, Bruce <bruce.chern...@intel.com> wrote:
>> On May 14, 2018, at 8:59 AM, Alexandre <alexandre.gauthier-foic...@inria.fr
>> <mailto:alexandre.gauthier-foic...@inria.fr>> wrote:
>> Sorry for the inconvenience if this message is not appropriate for this
>> mailing list.
>> The following is a question for developers of the swr driver of gallium.
>> I am the main developer of a motion graphics application.
>> Our application internally has a dependency graph where each node may run
>> We use OpenGL extensively in the implementation of the nodes (for example
>> with Shadertoy).
>> Our application has 2 main requirements:
>> - A GPU backend, mainly for user interaction and fast results
>> - A CPU backend for batch rendering
>> Internally we use OSMesa for CPU backend so that our code is mostly
>> identical for both GPU and CPU paths.
>> However when it comes to CPU, our application is heavily multi-threaded:
>> each processing node can potentially run in parallel of others as a
>> dependency graph.
>> We use Intel TBB to schedule the CPU threads.
>> For each actual hardware thread (not task) we allocate a new OSMesa context
>> so that we can freely multi-thread operators rendering. It works fine with
>> llvmpipe and also SWR so far (with a patch to fix some static variables
>> inside state_trackers/osmesa.c).
>> However with SWR using its own thread pool, I’m afraid of over-threading,
>> introducing a bottleneck in threads scheduling
>> e.g: on a 32 cores processor, we already have lets say 24 threads busy on a
>> TBB task on each core with 1 OSMesa context.
>> I looked at the code and all those concurrent OSMesa contexts will create a
>> SWR context and each will try to initialise its own thread pool in
>> CreateThreadPool in swr/rasterizer/core/api.cpp
>> Is there a way to have a single “static” thread-pool shared across all
>> contexts ?
> There is not currently a way to create a single thread-pool shared across all
> contexts. Each context creates unique worker threads.
> However, OpenSWR provides an environment variable, KNOB_MAX_WORKER_THREADS,
> that overrides the default thread allocation.
> Setting this will limit the number of threads created by an OpenSWR context
> *and* prevent the threads from being bound to physical cores.
> Please, give this a try. By adjusting the value, you may find the optimal
> value for your situation.
>> Thank you
>> mesa-dev mailing list
>> firstname.lastname@example.org <mailto:email@example.com>
mesa-dev mailing list