Hi! On 2024-03-07T15:07:32+0100, Tobias Burnus <tbur...@baylibre.com> wrote: > first, I have the feeling we talk about (more or less) the same code > region and use the same words – but we talk about rather different > things. Thus, you confuse me (and possibly Andrew) – and my reply > confuses you.
That, indeed, is my impression, too. :-/ And actually the biggest confusion seems to be that both you would like 'GCN_SUPPRESS_HOST_FALLBACK' to mean something else than 'HSA_SUPPRESS_HOST_FALLBACK' originally meant. Hopefully the "GCN: The original meaning of 'GCN_SUPPRESS_HOST_FALLBACK' isn't applicable (non-shared memory system)" does clarify that. Just to close this out, let's try again for the other discussion items: > Thomas Schwinge wrote: >> On 2024-03-07T12:43:07+0100, Tobias Burnus<tbur...@baylibre.com> wrote: >>> Thomas Schwinge wrote: >>>> First, I think most users do not set GCN_SUPPRESS_HOST_FALLBACK – and it >>>> is also not really desirable. >> External users probably don't, but certainly all our internal testing is >> setting it, > > First, I doubt it 'git grep --cached GCN_SUPPRESS_HOST_FALLBACK' in our internal scripts is your friend. > secondly, if it were true, it was broken for the > last 5 years or so as we definitely did not notice fails due to not > working offload devices. – Neither for AMD GCN nor ... You're saying that 'GCN_SUPPRESS_HOST_FALLBACK=1' doesn't report as fatal certain errors during device probing? That's not what the code as well as my experience says. >> and also implicitly all nvptx offloading testing: simply by >> means of having ["no" missing here -- sorry!] such knob in the libgomp nvptx >> plugin. > > I did see it at some places set for AMD but I do not see any > nvptx-specific environment variable which permits to do the same. Right, that was confusing: there was a "no" missing in that sentence -- sorry! > However: >> That is, the >> libgomp nvptx plugin has an implicit 'suppress_host_fallback = true' for >> (the original meaning of) that flag > > I think that's one of the problems here – you talk about > suppress_host_fallback (implicit, original meaning), while I talk about > the GCN_SUPPRESS_HOST_FALLBACK environment variable. The 'suppress_host_fallback' internal variable directly corresponds to the 'GCN_SUPPRESS_HOST_FALLBACK' environment variable. > Besides all the talk about suppress_host_fallback, > 'init_hsa_runtime_functions' is not fatal' of the subject line seems to > be something to be considered (beyond the patches you already suggested). I'll next submit "GCN, nvptx: Errors during device probing are fatal". >>> If I run on my Linux system the system compiler with nvptx + gcn suppost >>> installed, I get (with a nvptx permission problem): >>> >>> $ GCN_SUPPRESS_HOST_FALLBACK=1 ./a.out >>> >>> libgomp: GCN host fallback has been suppressed >>> >>> And exit code = 1. The same result with '-foffload=disable' or with >>> '-foffload=nvptx-none'. >> I can't tell if that's what you expect to see there, or not? > > Well, obviously In this discussion thread here, nothing was obvious to my anymore... ;-| > not that I get this error by default – and as your > wording indicated that the internal variable will be always true That always-'true' suggestion was only for the *original* meaning of the variable: the use in 'GOMP_OFFLOAD_can_run'. > – and > not only when the env var GCN_SUPPRESS_HOST_FALLBACK is explicit set, I > worry that I would get the error any time. That was exactly the point of my patch in this thread: to get rid of the *additional*/*new* behavior that the libgomp GCN plugin derives from 'GCN_SUPPRESS_HOST_FALLBACK', different from what 'HSA_SUPPRESS_HOST_FALLBACK' originally meant. However, I now understand that Andrew would like to keep that *new* behavior. >> (For avoidance of doubt: I'm expecting silent host-fallback execution in >> case that libgomp GCN and/or nvptx plugins are available, but no >> corresponding devices. That's what my patch achieves.) > > I concur that the silent host fallback should happen by default (unless > env vars tell otherwise) - at least when either no code was generated > for the device (e.g. -foffload=disable) or when the vendor runtime > library is not available or no device (be it no hardware or no permission). > > That's the current behavior and if that remains, my main concern evaporates. ACK, thanks. Grüße Thomas >>> If we want to remove it, we can make it always false - but I am strongly >>> against making it always true. >> I'm confused. So you want the GCN and nvptx plugins to behave >> differently in that regard? > No – or at least: not unless GCN_SUPPRESS_HOST_FALLBACK is set. >>> Use OMP_TARGET_OFFLOAD=mandatory (or that GCN env) if you want to >>> prevent the host fallback, but don't break somewhat common systems. >> That's an orthogonal concept? > > No – It's the same concept of the main use of the > GCN_SUPPRESS_HOST_FALLBACK environment variable: You get a run-time > error instead of a silent host fallback. > > But I have in the whole thread the feeling that – while talking about > the same code region and throwing in the same words – we actually talk > about completely different things. > > Tobias