JonChesterfield added a comment.

This change is partly motivated by wanting to check in runtime tests for openmp 
that execute on whatever hardware is available locally. It is functionally 
similar to an out of tree bash script called mygpu that contains manually 
curated tables of pci.ids and to a python script called rocm_agent_enumerator 
that calls a c++ tool called rocminfo and tries to parse the output, with a 
different table of pci.ids for when that fails.

Ultimately, the bottom of this stack is a map from pci.id to gfx number stored 
in the user space driver thunk library, roct. That is linked into hsa. It would 
be simpler programming to copy&paste that map from roct into the openmp clang 
driver at the cost of inevitable divergence between the architecture clang 
detects and the architecture the runtime libraries detect. Spawning a process 
and reading stdout is a bit messy, but it beats copying the table, and it beats 
linking the gpu driver into clang in order to get at the table of numbers. This 
seems the right balance to me.

It should be possible to do something similar with cuda for nvptx, but that 
should be a separate executable. Partly to handle the permutations of cuda / 
hsa that may be present on a system. I haven't found the corresponding API call 
in cuda. The standalone tool nvidia-smi might be willing to emit sm_70 or 
similar to stdout, but I haven't found the right command line flags for that 
either. Rocminfo does not appear to be configurable, and is not necessarily 
present when compiling for amdgpu.

A bunch of comments inline, mostly style. I think there's a use-after-free bug.

It looks like our existing command line handling could be more robust. In 
particular, there should be error messages about march where there seem to be 
asserts. That makes it slightly unclear how we should handle things like the 
helper tool returning a string clang doesn't recognise (e.g. doesn't start with 
gfx). Something to revise separate to this patch I think.



================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:78
+detectSystemGPUs(const ToolChain &T) {
+  auto Program = T.GetProgramPath("amdgpu-arch");
+  llvm::SmallVector<llvm::StringRef, 8> execArgs;
----------------
AMDGPU_ARCH_PROGRAM_NAME?


================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:83
+                                     OutputFile);
+  llvm::FileRemover OutputRemover(OutputFile.c_str());
+  llvm::Optional<llvm::StringRef> Redirects[] = {
----------------
This looks like there are too many stringrefs. Redirecting stdout to a 
temporary file seems reasonable, but I'd expect the pointers into OutputBuf to 
be invalid when it drops out of scope. Perhaps return a smallvector of 
smallstrings instead?

Also, we're probably expecting fewer than 8 different gpus, probably as few as 
1 in the most common case, so maybe a smallvector<type,1>


================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:90
+
+  if (const int RC =
+          llvm::sys::ExecuteAndWait(Program.c_str(), execArgs, {}, Redirects)) 
{
----------------
`s/const int RC =//`


================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:91
+  if (const int RC =
+          llvm::sys::ExecuteAndWait(Program.c_str(), execArgs, {}, Redirects)) 
{
+    return {};
----------------
can we pass {} for execArgs here?


================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:111
+  }
+  if (GPUArchs.size() > 1) {
+    bool AllSame = std::all_of(
----------------
Perhaps run all_of against the whole range and drop if size () > 1 test?


================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:208
   assert(GPUArch.startswith("gfx") && "Unsupported sub arch");
+  assert(!GPUArch.empty() && "Unable to detect system GPU");
 
----------------
We shouldn't be handling unknown or missing march= fields with asserts. I see 
that this is already the case in multiple places, so let's go with a matching 
assert for this and aspire to fix that in a separate patch.


================
Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:32
+
+    HSAAgentCollector *Self = static_cast<HSAAgentCollector *>(Data);
+    char GPUName[64];
----------------
Simpler code if we drop the class and pass in the vector<string> itself as the 
void*


================
Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:34
+    char GPUName[64];
+    Status = hsa_agent_get_info(Agent, HSA_AGENT_INFO_NAME, GPUName);
+    if (Status != HSA_STATUS_SUCCESS) {
----------------
Does this null terminate for any length of GPU name? Wondering if we should 
explicitly zero out the last char.


================
Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:45
+    if (Status != HSA_STATUS_SUCCESS) {
+      fprintf(stderr, "Unable to initialize HSA\n");
+    }
----------------
Unsure these should be writing to stderr. We capture stdout, stderr probably 
goes to the user. We could exit 1 instead as clang is going to treat any 
failure to guess the arch identically 


================
Comment at: clang/tools/amdgpu-arch/CMakeLists.txt:11
+
+find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS 
/opt/rocm)
+if (NOT ${hsa-runtime64_FOUND})
----------------
Assuming this matches the logic in amdgpu openmp plugin


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99949/new/

https://reviews.llvm.org/D99949

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to