Sandra Loosemore wrote:
static inline void gomp_barrier_init (gomp_barrier_t *bar, unsigned count)
{
+ unsigned actual_thread_count = __builtin_gcn_dim_size (1);
+ if (count > actual_thread_count)
+ count = actual_thread_count;
I wonder whether reducing the number of threads will
lead to inconsistencies. If I look at the caller:
--------------
struct gomp_team *
gomp_new_team (unsigned nthreads)
{
...
team = get_last_team (nthreads);
if (team == NULL)
{
...
#ifdef GOMP_USE_ALIGNED_WORK_SHARES
team = gomp_aligned_alloc (__alignof (struct gomp_team),
sizeof (*team) + nthreads * extra);
#else
team = team_malloc (sizeof (*team) + nthreads * extra);
#endif
...
gomp_barrier_init (&team->barrier, nthreads);
...
team->nthreads = nthreads;
}
-------------------
The 'get_last_team' for nvptx + gcn is just (config/accel/pool.h):
gomp_get_thread_pool (struct gomp_thread *thr, unsigned nthreads)
{
/* NVPTX is running with a fixed pool of pre-started threads. */
return thr->thread_pool;
}
which does not depend on the number of threads. But
omp_get_num_threads (void)
{
struct gomp_team *team = gomp_thread ()->ts.team;
return team ? team->nthreads : 1;
}
seems to go wrong if the number of threads is lower.
I wonder whether 'gomp_barrier_init' should return an updated
number of threads or …?
* * *
Additionally, I wonder whether gomp_barrier_reinit needs to be touched
as well – and not only gomp_barrier_init as this patch did.
* * *
BTW: when using in gomp_new_team:
nthreads = gomp_barrier_init (&team->barrier, nthreads);
the number of 'Success' largely increased for the omptests, only showing
the following fails:
t-reduction/test.c:535: Failed
t-reduction/test.c:568: Failed at 0 with OUT[0] + num_tests[0], expected
2156, got 483
t-reduction/test.c:578: Failed
t-reduction/test.c:611: Failed at 0 with OUT[0] + num_tests[0], expected
2156, got 483
Thus, this clearly shows that the current patch is not quite right.
Tobias