Hi Deepak -- Alexey (the aforementioned European collaborator) passes along notes on his CPU work that I'm attaching at the end of my message. They reinforce my sense that resurrecting Albert's work doesn't seem like a sustainable way to support OpenCL generation / APUs, particularly within the scope of an intern project. I fear that while you might accomplish a working demonstration, it would likely end up feeling more like a one-off stunt than a sound way to target APUs, and assume that that wouldn't be satisfying in the long-term.
If I had a team member who was responsible for targeting APUs, I wouldn't have them work from Albert's branch because I think the right way to represent APUs in Chapel is through hierarchical locales; and because I would also want to create a codegen story that was more complete/holistic than the one taken there. That's what would lead me to either using the LLVM back-end (if that seemed tractable) or creating an OpenCL-specific code generator (which Alexey alludes to as well). I'd worry that the latter may exceed the scope of a summer project; the former seems more tractable if it's satisfying (i.e., wouldn't also feel like a stunt and could ultimately generate good code). W.r.t. your desire to attach pragmas to forall loops, we have some vaguely similar work going on at present, in that we are working to: (a) generate loops like 'for i in 1..10' more like the C equivalent you would expect and (b) have a capability for attaching a pragmas (like a vectorization directive) to it. The idea is then to have the serial loops that express each task's local portion of a 'forall' loop use these pragmas to give more semantic information/intent to the back-end C compiler. It's hard for me to predict how long this will take for us to complete though (in particular, whether we'll get far enough fast enough for you to leverage it during your internship). As a result of these questions and uncertainties, I'm wondering whether it might be useful to have a short phonecall next week to understand better what you're trying to accomplish and wrestle through options in real-time rather than on email. Let me know if that would be useful. Alexey's message is just below. -Brad ---- Hello Brad As you may recall, GPU support in v.1.7 was only nominally present; indeed it was completely broken but I managed to repair it using Albert's code from v.1.2 as a roadmap. As the result I was able to compile and run on our CSCS GPU-enabled machine two simple streaming triad examples from the original Albert's distribution. Needless to say that this was just a simple proof of the concept, not a demo of coding a real life GPU application with Chapel. A bit later, when v.1.8 was released, I found that the entire GPU support has been removed from the distribution. I managed to put back all GPU-related code in the compiler and was able to generate credibly looking code for the said simple examples. I did not however modify Chapel runtime accordingly due to the lack of resources and changing priorities. I have also realised that it might be counter-productive to update each new Chapel release in the same way and therefore perhaps some generic plug-ins for accelerators should be designed and integrated into the main Chapel release in a coordinated effort with core Chapel team. Meanwhile you introduced hierarchical localities and I chose a completely different approach based on placing all CUDA-related functionality in a specialized back-end running behind the official Chapel compiler software stack. Right now I am working exactly in this direction and it will take some time and effort to get any sound result (and my "denormalization" sub-project, this time for the entire serial Chapel subset and complete set of v.1.9 compiler passes, must succeed before I even start to do anything with CUDA again). Implementing support for OpenCL, though similar in principle, would require substantial time and certain degree of commitment (especially if building real life applications is an ultimate goal). Best regards Alexey On Thu, 26 Jun 2014, Majeti, Deepak wrote: > Hi Brad, > > I did mean the work by Albert Sidelnik. Can you please check with the > European collaborator on the status ? We can consider extending that > work based on the status. > > We were indeed looking at other options like working on the intermediate > c-code/llvm-ir code. > > One problem is that we have to somehow annotate the "forall" loops so > that we can accelerate the corresponding regions in the generated > C-Code/llvm-ir. We also have to modify the compiler generated runtime > calls to point to the OpenCL runtime. > > > From: Brad Chamberlain [mailto:[email protected]] > Sent: Thursday, June 26, 2014 1:20 AM > To: Majeti, Deepak; [email protected] > Subject: RE: Regarding OpenCL support > > Hi Deepak -- > > I'm guessing that by "the current CUDA generation", you're referring to the > work that Albert Sidelnik did on a branch a few years ago? If so, I don't > think that's a very viable way forward, as the branch very far behind trunk > (this was the case even before Albert finished working on it). I believe > there was some work done by a European collaborator last year to try and > bring this back in-line with trunk, but offhand I can't recall how far that > got -- can check tomorrow and put you in touch if you'd like. But, even then > I worry that the approach that one would want to take today would make use of > hierarchical locales which would require a fairly significant re-architecting > of things. > > Another option for pursuing OpenCL and/or an APU *might* be to use Chapel's > LLVM back-end. At dinner tonight, it was conjectured that it ought to > generate correct code but that additional wrapper code may need to be placed > around it to offload to the APU (if I understood correctly). It was > conjectured that coworkers of yours at AMD may have a better sense than we do > as to whether or not there's a viable path forward taking this approach. If > there's more that you need to know to make the call from our side, give a > shout (on the list -- others are more knowledgeable about this than I am). > > -Brad > > > ________________________________ > From: Majeti, Deepak [[email protected]] > Sent: Wednesday, June 25, 2014 3:33 PM > To: Brad Chamberlain; > [email protected]<mailto:[email protected]> > Subject: RE: Regarding OpenCL support > Hi Brad, > > > We want to run Chapel on AMD's APU hardware and this will require OpenCL > generation. > > I am planning to support OpenCL by extending the current CUDA generation. > Essentially modify the compiler to generate OpenCL equivalent of CUDA > constructs. > Any advice from your side on pursuing this direction will be very helpful. > Thanks! > > From: Brad Chamberlain [mailto:[email protected]] > Sent: Sunday, June 22, 2014 7:05 PM > To: Majeti, Deepak; > [email protected]<mailto:[email protected]> > Subject: RE: Regarding OpenCL support > > Not at present that I am aware of. A few external developers have asked > about this over time, but to my knowledge, I'm not aware of anyone pursuing > it at present. > > -Brad > > ________________________________ > From: Majeti, Deepak [[email protected]] > Sent: Friday, June 20, 2014 2:14 PM > To: > [email protected]<mailto:[email protected]> > Subject: [Chapel-developers] Regarding OpenCL support > Hi, > > I saw that the Chapel gpu branch supports cuda. > Is there any effort towards OpenCL? > Thanks! > > -- > Deepak Majeti > Co-op External Research > AMD > ------------------------------------------------------------------------------ Open source business process management suite built on Java and Eclipse Turn processes into business applications with Bonita BPM Community Edition Quickly connect people, data, and systems into organized workflows Winner of BOSSIE, CODIE, OW2 and Gartner awards http://p.sf.net/sfu/Bonitasoft _______________________________________________ Chapel-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/chapel-developers
