https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125877
Bug ID: 125877
Summary: [OpenMP] Implement omp_get_device_distances
Product: gcc
Version: 17.0
Status: UNCONFIRMED
Keywords: openmp
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: burnus at gcc dot gnu.org
CC: jakub at gcc dot gnu.org
Target Milestone: ---
Implement:
void omp_get_device_distances(int ndevs, const int *devs, int *distances);
subroutine omp_get_device_distances(ndevs, devs, distances)
integer, intent(in) :: ndevs, devs(*)
integer, intent(out) :: distances(*)
"The value of each returned array element is an implementation-defined
relative distance between the place on which the routine is executing
and the device with the device-number specified by the corresponding
array element of the devs argument."
Expected:
- For device == omp_invalid_device + all other invalid ones
→ return -1 (impl. choice)
- For device == host → 0 (OpenMP definition)
- If unimplemented, return some constant value > 0
NOTE: ACPI range is 10 to 254.
NOTE: on virtualized machines, the NUMA distance might not be known;
Linux then returns -1 which needs to be change to some >= 0 number.
This requires two parts:
- determining the host + device topography
(once, host + AMD + Nvidia devices)
Linux host:
There are N = 'ls -d /sys/devices/system/node/node* | wc -l'
nodes, for each node there are N distance values, stored in
/sys/devices/system/node/node%d/distance
e.g. for two nodes, node0/distance = '10 32' and node1/distance = '32 10'
Device side
And for Nvidia:
CUresult cuDeviceGetPCIBusId ( char* pciBusId, int len, CUdevice dev )
which returns the string
[domain]:[bus]:[device].[function]
https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM_1g85295e7d9745ab8f0aa80dd1e172acfc
The Nvidia string can directly be fed into
/sys/bus/pci/devices/<pciBusId>/numa_node
AMD: Similarly.
r = hsa_agent_get_info (agent, HSA_AMD_AGENT_INFO_DOMAIN, &domain);
r = hsa_agent_get_info (agent, HSA_AMD_AGENT_INFO_BDFID, &bdfid);
with
"/sys/bus/pci/devices/%04x:%02x:%02x.0/numa_node",
domain, bdfid >> 16, bdfid & 0xFF
- obtaining the host NUMA node of the current thread
Linux: syscall to
int getcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache);
passing 'NULL, &node, NULL' as argument.