Re: [boinc_dev] Accelerator type identification issue

Richard Haselgrove Tue, 16 Apr 2013 03:27:04 -0700

Typo alert. "The ... file that the BOINC client writes to each slot" (second 
paragraph) is of course init_data.xml, and I suspect the sample file contents 
likewise. Calling it app_info.xml may set people's minds running down the wrong 
path.




>________________________________
> From: Charlie Fenton <[email protected]>
>To: Raistmer the Sorcerer <[email protected]> 
>Cc: [email protected] 
>Sent: Tuesday, 16 April 2013, 11:18
>Subject: Re: [boinc_dev] Accelerator type identification issue
> 
>
>Hello Raistmer,
>
>I am sorry if I have not been clear enough in my responses.  We have _already_ 
>done what you are requesting, but not the way you suggest.  Doing it the way 
>you suggest would be incompatible with existing applications, servers and 
>older versions of BOINC and so would break older code.  Everything we have 
>done is fully backward compatible.
>
>Under recent versions of BOINC, we have added more values to the app_info.xml 
>file that the BOINC client writes to each slot.  Let us use as an example the 
>case we have been discussing in the SETI forum, where the user has 2 ATI GPUs. 
> The first GPU (ATI GPU 0) is capable of (and recognized by) CAL but not 
>OpenCL.  The second ATI GPU (GPU 1) is recognized by and capable of both CAL 
>and OpenCL.
>
>Thus, my test version of BOINC _correctly_ reports that ATI GPU 1 is the only 
>OpenCL capable ATI GPU:
>> CAL: ATI GPU 0: ATI Radeon HD 2600 (RV630) (CAL version 1.4.1734, 1024MB, 
>> 992MB available, 348 GFLOPS peak)
>> CAL: ATI GPU 1: ATI Radeon HD 4600 series (R730) (CAL version 1.4.1734, 
>> 1024MB, 992MB available, 960 GFLOPS peak)
>> OpenCL: AMD/ATI GPU 1: ATI Radeon HD 4600 series (R730) (driver version CAL 
>> 1.4.1734, device version OpenCL 1.0 AMD-APP (937.2), 1024MB, 992MB 
>> available, 960 GFLOPS peak)
>
>His app_info.xml file will now contain the following values specifying that 
>the application should use the first ATI GPU, which has ATI gpu_device_num 1 
>and which also has ATI gpu_opencl_dev_index of 0:
><gpu_type>ATI</gpu_type>
><gpu_device_num>1</gpu_device_num>
><gpu_opencl_dev_index>0</gpu_opencl_dev_index>
>
>Older versions of init_data.xml don't have gpu_opencl_dev_index field.  Still 
>older versions of init_data.xml don't have the gpu_device_num or the gpu_type 
>field.
>
>So those versions of BOINC passed the gpu_device_num to the application in the 
>command line.  If the value of gpu_device_num was 1, they would pass "--device 
>1" in the command line.
>
>For backward compatibility, BOINC _still_ passes the gpu_device_num to the 
>application in the command line; in our example case it _still_ passes 
>"--device 1" in the command line.  To do anything different would break 
>compatibility with older applications!
>
>Around August 2011, we realized that passing the device number is not 
>sufficient if a user had both ATI and NVIDIA GPUs on the same computer, so we 
>created the API:
>   int boinc_get_opencl_ids(cl_device_id* device, cl_platform_id* platform);
>which gets the GPU vendor and device number from the init_data.xml file.
>
>In January 2012, we discovered that on Macs, Apple's OpenCL does not support 
>some NVIDIA GPUs which CUDA does support, so we added the gpu_opencl_dev_index 
>field.  This allowed the boinc_get_opencl_ids() API to handle these correctly. 
> OpenCL project applications did not need to worry about this, as long as they 
>were linked with a current version of boinc_get_opencl_ids().  When used with 
>older versions of BOINC which do not provide the gpu_opencl_dev_index field, 
>boinc_get_opencl_ids() reverts to using only gpu_device_num to be as backward 
>compatible as possible.
>
>But in December 2012, we realized that boinc_get_opencl_ids() was not 
>compatible with very old clients which did not provide the gpu_device_num or 
>the gpu_type field.  So we deprecated the old boinc_get_opencl_ids() API and 
>added a new version which takes 5 arguments:
>  int boinc_get_opencl_ids(int argc, char** argv, int type, cl_device_id* 
>device, cl_platform_id* platform);
>
>Passing in the same argv and argc which were passed to the application allows 
>this function to use the value of --device from the command line for 
>compatibility with very old BOINC clients which did not have the 
>gpu_device_num field in the init_data.xml file.  This gives us even better 
>backward compatibility than we had before.  And allowing the project 
>application to pass in the type (NVIDIA, ATI or Intel) allows it to work with 
>older BOINC clients which did not have the gpu_type in the init_data.xml file.
>
>This newest boinc_get_opencl_ids() API has an added feature.  If your OpenCL 
>application can run on any vendor's GPU, then you can create a plan class 
>telling BOINC that the vendor (gpu_type) does not matter.  On any version of 
>BOINC new enough to put the gpu_type in the init_data.xml file, that one 
>application will run on whichever GPU is assigned by BOINC; you will no longer 
>need separate copies of the same OpenCL application for each GPU vendor.
>
>I looked at the source code for your OpenCL and Brook anonymous platform SETI 
>Astropulse applications.  I see they do not examine the --device argument 
>directly, but instead call the older version of boinc_get_opencl_ids() with 2 
>arguments.  I strongly recommend you update to the newer, 5 argument version 
>to have backward compatibility with even older versions of the BOINC client.
>
>A release of the BOINC client in the near future will handle most situations 
>where one or more ATI/AMD GPUs support CAL but not OpenCL.  Both the 
>_existing_ 2-argument and the _existing_ 5-argument versions of 
>boinc_get_opencl_ids() will take advantage of the improved GPU detection logic 
>in this new BOINC client.
>
>I have one more suggestion.  In your OpenCL anonymous platform SETI Astropulse 
>application, the application writes "BOINC assigns device %d" with the value 
>of BOINCs_device, which is the value of the gpu_opencl_dev_index.  This is 
>confusing to users, who have seen the GPUs identified by their physical device 
>number gpu_device_num in the Event Log.  It would be better if the application 
>would display the physical device number, and use the gpu_opencl_dev_index 
>only internally.  You can get the value of gpu_device_num either from the 
>--device command-line argument, or from the gpu_device_num field of the 
>init_data struct.
>
>Cheers,
>--Charlie
>
>On Apr 16, 2013, at 1:12 AM, Raistmer the Sorcerer wrote:
>> >The reason BOINC _must_ use the same index for the same physical GPU is to 
>> >prevent assigning the same physical GPU to more than one task at a time. 
>> >This is the number reported by --device, and is the same as the index of 
>> >CAL or CUDA capable GPUs. 
>> 
>> BOINC - yes (inside scheduler), but should BOINc report that physical number 
>>  to scientific apps? No. It should not!
>> For what reason --device N should mean PHYSICAL DEVICE ?
>> What I propose is to set --device N meaning as next: index to device array, 
>> recived by that enumeration API that app of corresponding type uses.
>> That is, if app is CAL app than --device N means index into array of CAL 
>> devices.
>> If it's CUDA app then --device is index to array of OpenCL devices (of 
>> correspnding type NV, ATi, or intel_gpu).
>> And so on.
>> Look!
>> Currently we can have NV GPU + ATI GPU in the same OS. So, 2 physical 
>> devices. 
>> But each of NV and ATi apps will recive --device 0 ! As it should be if 
>> device will be defined as I propose, not as just "physical device". There 
>> are 2 physical devices of different types.
>> In case of CAL and OpenCL there are too 2 different physocal devices. 2 
>> devices of CAL type and 1 device of OpenCL type. BOINC (and ONLY BOINC 
>> CLIENT) should know that device X from CAL list is same as device Y from 
>> OpenCL list.
>> 
>> 
>> As of BOINC version 7.0.12, we have added a second index, which is the index 
>> of only openCL-capable GPUs. In the above example, this would have the value 
>> 0 for the HD 4600, and this value provides the API-specific index Raistmer 
>> requests.
>> 
>> The reasons that we have deprecated the use of --device and now require GPU 
>> applications to instead call boinc_get_opencl_ids(int argc, char** argv, int 
>> type, cl_device_id* device, cl_platform_id* platform). It also optionally 
>> allows an application to offer a plan class allowing it to run on all OpenCL 
>> capable GPUs, not just from one vendor.
>> 
>> The reason for the change is that this newer API deals automatically with 
>> the possible difference between the CAL or CUDA device index and the OpenCL 
>> device index. As the comments in the source file explain:
>> // A few complicating factors:
>> // Windows & Linux have a separate OpenCL platform for each vendor
>> // (NVIDIA, AMD, Intel).
>> // Mac has only one platform (Apple) which reports GPUs from all vendors.
>> //
>> // In all systems, opencl_device_indexes start at 0 for each platform
>> // and device_nums start at 0 for each vendor.
>> //
>> // On Macs, OpenCL does not always recognize all GPU models detected by 
>> // CUDA, so a device_num may not correspond to its opencl_device_index 
>> // even if all GPUs are from NVIDIA.
>> 
>> I will add to this that we have recently learned that AMD's OpenCL does not 
>> always recognize all GPU models detected by CAL, so a device_num may not 
>> correspond to its opencl_device_index even if all GPUs are from ATI/AMD.
>> 
>> NOTE: The new boinc_get_opencl_ids() API is 100% backward compatible with 
>> older versions of the BOINC client. From the source file's comments:
>> 
>> // This version is compatible with older clients.
>> // Usage:
>> // Pass the argc and argv received from the BOINC client
>> // type: may be PROC_TYPE_NVIDIA_GPU, PROC_TYPE_AMD_GPU or 
>> PROC_TYPE_INTEL_GPU
>> // (it may also be 0, but then it will fail on older clients.)
>> //
>> // The argc, argv and type arguments are ignored for 7.0.12 or later clients.
>> //
>> // returns
>> // - 0 if success
>> // - ERR_FOPEN if init_data.xml missing
>> // - ERR_XML_PARSE if can't parse init_data.xml
>> // - CL_INVALID_DEVICE_TYPE if unable to get gpu_type information
>> // - ERR_NOT_FOUND if unable to get opencl_device_index or gpu device_num
>> // - an OpenCL error number if OpenCL error
>> 
>> Finally, we have added two new prototype plan classes: opencl_nvidia_101 and 
>> opencl_ati_101 for app versions that run on NVIDIA or ATI GPUs using OpenCL 
>> 1.1, using at most 256MB of GPU RAM. You can modify sched_customize.cpp to 
>> change these parameters or add your own plan classes, such as for OpenCL 1.0 
>> or 1.2. These plan classes are not backward compatible and require BOINC 
>> 7.0.x.
>> 
>> Information about all of the above can be found at 
>> <http://boinc.berkeley.edu/trac/wiki/OpenclApps>.
>> 
>> I hope this answers your questions.
>> 
>> Cheers,
>> --Charlie
>> 
>> On Apr 15, 2013, at 6:48 AM, Raistmer the Sorcerer wrote:
>> > Regarding deprecation of --device N option:
>> > can anyone provide description for what reason it was done?
>> > 
>> > Each API contains own enumeration.
>> > Each enumeration (in particular device class) starts from zero (0).
>> > What prevents BOINC to report --device N to app correctly if BOINC knows 
>> > for what accelerator class designed ?
>> > In view of recent CAL/OpenCL issue (or in view OSX CUDA OpenCL issue, no 
>> > matter):
>> > --device N for CAL should be 1 and 0 (2 CAL enabled devices installed);
>> > --device N for OpenCL should be only 0 (1 OpenCL capable device 
>> > installed). BOINC keeps track what device is what physical device, app 
>> > just need device number in own API enumeration scheme.
>> > For what reason (for example) my OpenCL app should know that there are 
>> > another, non-OpenCL device in system ? It should not. Hence, no "device 
>> > 1", but "device 0". It doesn't kere about keyboard or mouse, it should not 
>> > care about CAL GPU too. It's BOINC mission not to allocate same physical 
>> > device both as CAL and OpenCL in the same time.
>> > Currently app recives OpenCL context handler. Ok, no probs with that. But 
>> > (!) ensure back compatibility! such OpenCL context should contain same 
>> > device as OpenCL enumeration API would provide if --device contains offset 
>> > in device list. What particular issues do you see with this? But providing 
>> > both --device _and_ OpenCL context (for what reason context - separrate 
>> > question but perhaps sometimes it's convenient indeed) you provide at 
>> > least partial backward compatibility. If one can provide backward 
>> > compatibility it should be done! 
>> > All this (BOINC) about using AVAILABLE user resources, already available 
>> > ones. Not about requesting users to upgrade OS, but new hardware and so 
>> > on. Backward compatibility should be keystone of BOINC concept. And all 
>> > these nor really needed "deprecations" will play badly with existing 
>> > userbase.
>> 
>
>_______________________________________________
>boinc_dev mailing list
>[email protected]
>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>To unsubscribe, visit the above URL and
>(near bottom of page) enter your email address.
>
>
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Accelerator type identification issue

Reply via email to