Hi!
We recently issued our first OpenCL App for NVidia devices and ran into a
problem with device assignment.
When there are two NVidia devices capable of OpenCL in the system, a (recent) Client (correctly) starts two such tasks, but both end up actually
running on the same devices while the other being idle.
Here are some details that may or may not help to track down the problem:
- BOINC Client version 7.0.28
- the two tasks correctly get additional command-line arguments --device 0 and
--device 1 respectively.
- init_data.xml of the two tasks contain
<gpu_type>NVIDIA</gpu_type>
<gpu_device_num>0</gpu_device_num>
<gpu_opencl_dev_index>0</gpu_opencl_dev_index>
and
<gpu_type>NVIDIA</gpu_type>
<gpu_device_num>1</gpu_device_num>
<gpu_opencl_dev_index>1</gpu_opencl_dev_index>
respectively
- the App was built with BOINC 9bef2edbf0fd6b9da2c282735935ce1b27727ddc (in current git-v2 repo, last commit was Charlie's "Fix file permissions" of
Jan 11)
- we have OpenCL Apps for ATI/AMD out there for years without such problems.
Did any other project observe something similar? Any idea what the problem is
and how to solve it?
When we (Einstein@Home & BOINC) developed OpenCL support in BOINC, we found that device detection and particularly device enumeration is highly
difficult and fragile, possibly even non-deterministic, especially when it comes to OpenCL (*). Therefore we agreed on the following scheme:
- the Client compiles and maintains a canonical (enumerated!) list of devices,
(e.g. in a shared memory)
- the apps are communicated an index to that list (e.g. via --device CLA)
- there will be API functions that get the relevant device properties
("device_id"s or whatever) FROM THAT LIST based on the communicated index
- this way no App should ever attempt or be required to try a device enumeration itself, as it might end up with a very different list than what the
Client has
Looking into boinc_get_opencl_ids_aux() in boinc_opencl.cpp I see that this agreement has apparently been completely ignored at least in the current
implementation.
Had there ever been an implementation of this agreement? If so, why and when
was is dropped? If not, why not?
Best,
Bernd
(*) The (IMHO horrible) truth is that according to OpenCL standard not even two subsequent calls to clGetDeviceIDs() in the same process necessarily
return the OpenCL devices in the same order. In most implementations they might, but I could also imagine a driver that instead of traversing the
whole PCI/PCI-bridge/Thunderbolt/USB/whatever tree (which also isn't as static as the author of the driver might think) in a deterministic way it just
"broadcasts" a call and returns the devices in the order of arrival of replies.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.