Hi Bernd,
How is your application determining the cl_platform_id and cl_device_id of the
GPU assigned by the client?
You should be using the api/boinc_opencl.cpp from March 4 (boinc 7.0.45) or
later. The version of api/boinc_opencl.cpp from January 11 is deprecated.
We discovered some time ago that the use of a single device number is overly
simplistic and problematic, because not all CUDA GPUs are recognized by OpenCL
on the Mac, and not all CAL-capable GPUs support openCL. All BOINC GPU apps
must use api/boinc_opencl.cpp to get the cl_platform_id and cl_device_id of the
GPU assigned by the client.
I recently wrote a detailed explanation to this (boinc_dev) email list. Here
it is again:
On Apr 15, 2013, at 4:09 PM, Charlie Fenton wrote:
> BOINC's GPU device number as displayed in the Event Log is for a physical
> GPU. In the case to which Raistmer refers, the first ATI GPU (GPU 0) is
> capable of (and recognized by) CAL but not OpenCL. The second ATI GPU (GPU
> 1) is recognized by and capable of both CAL and OpenCL.
>
> Thus, BOINC _correctly_ reports that ATI GPU 1 is the only OpenCL capable ATI
> GPU:
>> CAL: ATI GPU 0: ATI Radeon HD 2600 (RV630) (CAL version 1.4.1734, 1024MB,
>> 992MB available, 348 GFLOPS peak)
>> CAL: ATI GPU 1: ATI Radeon HD 4600 series (R730) (CAL version 1.4.1734,
>> 1024MB, 992MB available, 960 GFLOPS peak)
>> OpenCL: AMD/ATI GPU 1: ATI Radeon HD 4600 series (R730) (driver version CAL
>> 1.4.1734, device version OpenCL 1.0 AMD-APP (937.2), 1024MB, 992MB
>> available, 960 GFLOPS peak)
>
> The reason BOINC _must_ use the same index for the same physical GPU is to
> prevent assigning the same physical GPU to more than one task at a time.
> This is the number reported by --device, and is the same as the index of CAL
> or CUDA capable GPUs.
>
> As of BOINC version 7.0.12, we have added a second index, which is the index
> of only openCL-capable GPUs. In the above example, this would have the value
> 0 for the HD 4600, and this value provides the API-specific index Raistmer
> requests.
>
> The reasons that we have deprecated the use of --device and now require GPU
> applications to instead call boinc_get_opencl_ids(int argc, char** argv, int
> type, cl_device_id* device, cl_platform_id* platform). It also optionally
> allows an application to offer a plan class allowing it to run on all OpenCL
> capable GPUs, not just from one vendor.
>
> The reason for the change is that this newer API deals automatically with the
> possible difference between the CAL or CUDA device index and the OpenCL
> device index. As the comments in the source file explain:
> // A few complicating factors:
> // Windows & Linux have a separate OpenCL platform for each vendor
> // (NVIDIA, AMD, Intel).
> // Mac has only one platform (Apple) which reports GPUs from all vendors.
> //
> // In all systems, opencl_device_indexes start at 0 for each platform
> // and device_nums start at 0 for each vendor.
> //
> // On Macs, OpenCL does not always recognize all GPU models detected by
> // CUDA, so a device_num may not correspond to its opencl_device_index
> // even if all GPUs are from NVIDIA.
>
> I will add to this that we have recently learned that AMD's OpenCL does not
> always recognize all GPU models detected by CAL, so a device_num may not
> correspond to its opencl_device_index even if all GPUs are from ATI/AMD.
>
> NOTE: The new boinc_get_opencl_ids() API is 100% backward compatible with
> older versions of the BOINC client. From the source file's comments:
>
> // This version is compatible with older clients.
> // Usage:
> // Pass the argc and argv received from the BOINC client
> // type: may be PROC_TYPE_NVIDIA_GPU, PROC_TYPE_AMD_GPU or PROC_TYPE_INTEL_GPU
> // (it may also be 0, but then it will fail on older clients.)
> //
> // The argc, argv and type arguments are ignored for 7.0.12 or later clients.
> //
> // returns
> // - 0 if success
> // - ERR_FOPEN if init_data.xml missing
> // - ERR_XML_PARSE if can't parse init_data.xml
> // - CL_INVALID_DEVICE_TYPE if unable to get gpu_type information
> // - ERR_NOT_FOUND if unable to get opencl_device_index or gpu device_num
> // - an OpenCL error number if OpenCL error
>
> To further clarify how the newer API is backward compatible:
> int boinc_get_opencl_ids(int argc, char** argv, int type, cl_device_id*
> device, cl_platform_id* platform)
>
> First it tries to get the GPU "type" (vendor): ATI, NVIDIA or Intel, from the
> init_data.xml file. If that fails, it gets it from the type argument to the
> function, if present.
>
> Next it tries to get the gpu_opencl_dev_index and gpu_device_num from the
> init_data.xml file. If that fails, it gets the gpu_device_num from the
> --device argument.
Does this answer your questions?
Cheers,
--Charlie
On Apr 25, 2013, at 12:43 AM, Bernd Machenschalk wrote:
> Hi!
>
> We recently issued our first OpenCL App for NVidia devices and ran into a
> problem with device assignment.
>
> When there are two NVidia devices capable of OpenCL in the system, a (recent)
> Client (correctly) starts two such tasks, but both end up actually running on
> the same devices while the other being idle.
>
> Here are some details that may or may not help to track down the problem:
>
> - BOINC Client version 7.0.28
> - the two tasks correctly get additional command-line arguments --device 0
> and --device 1 respectively.
> - init_data.xml of the two tasks contain
> <gpu_type>NVIDIA</gpu_type>
> <gpu_device_num>0</gpu_device_num>
> <gpu_opencl_dev_index>0</gpu_opencl_dev_index>
> and
> <gpu_type>NVIDIA</gpu_type>
> <gpu_device_num>1</gpu_device_num>
> <gpu_opencl_dev_index>1</gpu_opencl_dev_index>
> respectively
> - the App was built with BOINC 9bef2edbf0fd6b9da2c282735935ce1b27727ddc (in
> current git-v2 repo, last commit was Charlie's "Fix file permissions" of Jan
> 11)
>
> - we have OpenCL Apps for ATI/AMD out there for years without such problems.
>
> Did any other project observe something similar? Any idea what the problem is
> and how to solve it?
>
>
> When we (Einstein@Home & BOINC) developed OpenCL support in BOINC, we found
> that device detection and particularly device enumeration is highly difficult
> and fragile, possibly even non-deterministic, especially when it comes to
> OpenCL (*). Therefore we agreed on the following scheme:
> - the Client compiles and maintains a canonical (enumerated!) list of
> devices, (e.g. in a shared memory)
> - the apps are communicated an index to that list (e.g. via --device CLA)
> - there will be API functions that get the relevant device properties
> ("device_id"s or whatever) FROM THAT LIST based on the communicated index
> - this way no App should ever attempt or be required to try a device
> enumeration itself, as it might end up with a very different list than what
> the Client has
>
> Looking into boinc_get_opencl_ids_aux() in boinc_opencl.cpp I see that this
> agreement has apparently been completely ignored at least in the current
> implementation.
>
> Had there ever been an implementation of this agreement? If so, why and when
> was is dropped? If not, why not?
>
> Best,
> Bernd
>
>
> (*) The (IMHO horrible) truth is that according to OpenCL standard not even
> two subsequent calls to clGetDeviceIDs() in the same process necessarily
> return the OpenCL devices in the same order. In most implementations they
> might, but I could also imagine a driver that instead of traversing the whole
> PCI/PCI-bridge/Thunderbolt/USB/whatever tree (which also isn't as static as
> the author of the driver might think) in a deterministic way it just
> "broadcasts" a call and returns the devices in the order of arrival of
> replies.
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.