Arsen Arsenović wrote:
Currently, libgomp performs initialization of all threads in a team
in its lead thread, and then releases all threads to do work.  This
means that, before reaching the release, each thread is doing nothing,
waiting for the lead threads to do lots of thread initialization
operations.

This initialization is identical for each thread.

...

+  thr->task = &start_data->team->implicit_task[threadid];
+  gomp_init_task (thr->task, start_data->parent_task, &start_data->prev_icvs);
+  /* TODO(arsen): This should be part of a mechanism that allows us to override
+     nthreads-var with OMP_NUM_THREADS.  But, we currently don't have access to
+     that list on the device.
+
+     thr->task->icv.nthreads_var = ...;  */
+  thr->task->taskgroup = start_data->taskgroup;


For completeness/to add.

OpenMP permits:
  OMP_NUM_THREADS_{ALL,DEV,DEV_<num>}
to initialize those values also for a device – and that environment
variable takes a list of values, each nested parallel taking a value,
leaving the rest for the next one to consume.

GCC passes several environment variables as ICVs to the device, but
the selection is a bit inconsistent – less useful ones are passed one,
more useful ones aren't. (I think we also don't support ALL or we
include the host in _DEV, but some part wasn't quite right there,
also because the spec was not fully specified back then.)

Additionally, since 6.0 (I think), OpenMP permits to specify a list
of values to num_threads – working likewise to the env var - the left
most one gets consumed first, leaving the others for inner parallel.

There is also now the 'strict' modifier and - upcoming -
  num_threads([strict/relaxed][,][dim(n)][:] ...)
to permit via 'dim' to mimic what is done for CUDA/HIP programming.

* * *

Arsen Arsenović wrote:
> Andrew Stubbs <[email protected]> writes:
>
>> On 05/05/2026 14:14, Arsen Arsenović wrote:
>>> + /* TODO(arsen): This should be part of a mechanism that allows us to override >>> + nthreads-var with OMP_NUM_THREADS. But, we currently don't have access to
>>> +     that list on the device.
>>> +
>>> +     thr->task->icv.nthreads_var = ...;  */
>>
>> The previous code did write to this field. Why does the new thread not do at
>> least the same? (This does not seem like "no functional changes".)
>
> This write was redundant with gomp_init_task, which does:
...

> On the device, however, that list is always empty, and analogous logic
> was not even present, so the value of nthreads_var is necessarily the
> same as prev_icv->nthreads_var, being written into a field where
> prev_icv->nthreads_var was already copied.
>
> So, it never had an effect.

Do we need to expand the comment here - or is the current TODO comment
enough? We eventually need to handle this properly (see above).

I guess, almost no one will use the environment variable, but I could
well imagine that some users will start using a list of values with 'num_threads'...

Tobias

Reply via email to