Sandra Loosemore wrote:
This patch, which was originally applied to the OG11 branch in 2021,
fixes a bug in which testcases using thread_limit larger than the
number of physical threads would crash with a memory fault. This was
exacerbated in testcases with a lot of register pressure because the
autoscaling reduces the number of physical threads to compensate for
the increased resource usage. We specifically saw this happen in the
t-reduction testcase in the external omptests testsuite. With this patch
that testcase now passes, and a couple of other failures are also fixed.
The included test case was greatly reduced from the t-reduction testcase
with c-vise and hand-editing. The code is nonsensical, but it was
triggering the memory fault with only 13 threads. It was also checked
with nvidia offloading and on x86_64 without offloading.
Here, the included testcase runs successfully on an gfx90a (MI210).
However, I can confirm that the without the path, omptests' t-reduction [1]
testcase fails with:
Able to use offloading! Memory access fault by GPU node-1 (Agent handle:
0x3330d40) on address 0x800000000. Reason: Unknown. While with the
patch, it runs successfully, outputting 11283 lines, some with
'Succeeded' others with 'Failed'. I also see one GCN team arena
exhausted; configure with GCN_TEAM_ARENA_SIZE=bytes But in any case,
that's better than before. * * * The code has: int gpu_threads = 512;
int max_threads = cpuExec ? cpu_threads : gpu_threads; ... for (int t =
0; t <= max_threads; t++) { ... TESTD2("omp target parallel
num_threads(t) REDUCTION_MAP REDUCTION_CLAUSES", and _Pragma("omp
parallel num_threads(threads+max_threads/2) REDUCTION_CLAUSES") [It
seems to be okay to reduce the number of threads, if there is no
'strict' modifier.] * * * Tobias
[1] https://github.com/doru1004/omptests/blob/main/t-reduction/