On Tue, Feb 07, 2012 at 08:16:23PM +0000, Peter Collingbourne wrote: > On Fri, Feb 03, 2012 at 05:24:38PM -0500, Tom Stellard wrote: > > On Fri, Feb 03, 2012 at 10:07:03PM +0000, Rotem, Nadav wrote: > > > Tom, > > > > > > Our OpenCL implementation of get_global_id is not target specific and we > > > don't resolve it in the backend. I think that get_global_id should be > > > implemented as a simple library call. However, this is something that > > > needs to be discussed with Tanya Lattner, Peter Collingbourne, Anton > > > Lokhmotov, etc. > > > > > > Nadav > > > > Nadav, > > > > Sorry, I guess I should be more clear. When I say target specific I'm > > talking about GPU targets. The get_global_id() implementation > > on the GPUs we've written a backend for (Evergreen, Northern Islands) > > requires reading values from special registers that are preloaded by the > > hardware. I'm guessing other GPUs do something similar, so I do think > > it is something that would need to be resolved in the backend. >
Hi Peter, > Hi Tom, > > I disagree. There is no need to resolve these functions in the > backend, as the register reads can simply be made part of the > get_global_id implementation. The libclc OpenCL C standard library > already targets NVIDIA GPUs, and it implements get_global_id and the > other work-item functions in exactly this way: > > http://git.pcc.me.uk/?p=~peter/libclc.git;a=blob;f=ptx-nvidiacl/include/clc/workitem/get_global_id.h > > I looked at this before and it seems like a good approach, but the one thing I couldn't figure out was: If I have clang embedded in my OpenCL implementation and it is using an out of tree backend, how do I get clang to recognize my target's builtins as valid and then map them to the appropriate target specific intrinsic? Thanks, Tom > Your way, we leak language-specific details into the backends, details > which can be dealt with by the frontend and standard library, so that > the backends can be kept language independent. > > Also, there is no one way of implementing functions like get_global_id > on (say) PTX. One thing that is missing from the get_global_id > implementation in libclc is global offsets. The precise details > of how those offsets are passed to the kernel have varied over > time (for example, at one point a global array was used, and now > special registers are used). If we encode these sorts of details > in the backend we reduce overall flexibility and deny optimisation > opportunities to the optimisers. For example, say the kernel computes: > > get_global_id(0) - get_global_offset(0) > > On PTX the optimisers should be able to eliminate the global offset > access altogether, but they might not be able to if get_global_id > and get_global_offset are opaque intrinsic calls. > > Thanks, > -- > Peter > _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
