https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125877

            Bug ID: 125877
           Summary: [OpenMP] Implement omp_get_device_distances
           Product: gcc
           Version: 17.0
            Status: UNCONFIRMED
          Keywords: openmp
          Severity: normal
          Priority: P3
         Component: libgomp
          Assignee: unassigned at gcc dot gnu.org
          Reporter: burnus at gcc dot gnu.org
                CC: jakub at gcc dot gnu.org
  Target Milestone: ---

Implement:

void omp_get_device_distances(int ndevs, const int *devs, int *distances);

subroutine omp_get_device_distances(ndevs, devs, distances)
  integer, intent(in) :: ndevs, devs(*)
  integer, intent(out) :: distances(*)

"The value of each returned array element is an implementation-defined
 relative distance between the place on which the routine is executing
 and the device with the device-number specified by the corresponding
 array element of the devs argument."

Expected:
- For device == omp_invalid_device + all other invalid ones
  → return -1 (impl. choice)
- For device == host → 0 (OpenMP definition)
- If unimplemented, return some constant value > 0

NOTE: ACPI range is 10 to 254.
NOTE: on virtualized machines, the NUMA distance might not be known;
Linux then returns -1 which needs to be change to some >= 0 number.


This requires two parts:

- determining the host + device topography
  (once, host + AMD + Nvidia devices)

  Linux host:
  There are N = 'ls -d /sys/devices/system/node/node* | wc -l'
  nodes, for each node there are N distance values, stored in
    /sys/devices/system/node/node%d/distance
  e.g. for two nodes, node0/distance = '10 32' and node1/distance = '32 10'


  Device side  

  And for Nvidia:
    CUresult cuDeviceGetPCIBusId ( char* pciBusId, int  len, CUdevice dev )
  which returns the string
     [domain]:[bus]:[device].[function]

   
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g85295e7d9745ab8f0aa80dd1e172acfc

   The Nvidia string can directly be fed into
     /sys/bus/pci/devices/<pciBusId>/numa_node

   AMD: Similarly.
     r = hsa_agent_get_info (agent, HSA_AMD_AGENT_INFO_DOMAIN, &domain);
     r = hsa_agent_get_info (agent, HSA_AMD_AGENT_INFO_BDFID, &bdfid);
   with
     "/sys/bus/pci/devices/%04x:%02x:%02x.0/numa_node",
     domain, bdfid >> 16, bdfid & 0xFF


- obtaining the host NUMA node of the current thread
  Linux: syscall to
   int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);
  passing 'NULL, &node, NULL' as argument.

Reply via email to