On Wed, Sep 23, 2015 at 11:24:16PM +0300, Alexander Monakov wrote: > > These patches provide stub functionality, which > > is easy enough, but I can't tell whether there's a credible plan to provide > > a > > full implementation. GPUs really need a different programming model than > > normal CPUs, which is something I learned the hard way, and I'm not terribly > > optimistic about porting libgomp to ptx. (I may be wrong.) > > Right, libgomp running on ptx would have to do many things differently from > how it does now (and some drop entirely, like affinity). Thankfully it can be
Sure, affinity doesn't have to be supported. And, eventually some simpler constructs can be e.g. inlined by the compiler if it is desirable. Some constructs like tasking though are just too complex to handle them without sharing code in the library. Static scheduling loops are already expanded inline by the compiler except for ordered loops (which are again hard to handle without library side), other scheduling kinds IMHO just can be shared with the CPU implementation, etc. > implemented piecemeal in config/nvptx, without #ifdef butchery in the primary > source files. The plan towards providing a full implementation is thus to We really don't need to avoid all #ifdef stuff, just keep it to a reasonable maintanable level. > > In one patch you mention newlib pthread type definitions - are you aware > > that > > there is no real pthreads implementation in the ptx newlib? The ptx newlib > > is > > really only provided for a minimal subset of libc functionality. > > Sure, I'm aware. The point was to make libgomp.h valid to be included into > the rest of to-be-ported source files, keeping modifications to it to a > minimum. If the idea is that relying on #include <pthread.h> available on > nvptx in the first place is too much of a hack, we can discuss alternatives :) I'd say for e.g. libgomp.h it is acceptable to use what I've posted in https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01418.html, so HAVE_PTHREAD_H and LIBGOMP_USE_PTHREAD guards. It is likely some other offloading target in the future (somebody has been talking about e.g. ARM offloading to Epiphany (Parallella board)) will have the same need (i.e. no pthreads, and either a dummy pthread.h around, or not at all). Plus of course we need NVPTX version of gomp_thread (), that can be guarded with __nvptx__ ifdef (if the implementation is small, but I'd hope it is, some CTA local pointer and pointer arithmetics - indexed by %tid.x / WRAP_SZ or something similar. > > My other concern would be not to approve changes to the gomp-4_0-branch that > > could derail or slow down the effort to implement OpenACC, which has a much > > better chance of being in gcc-6 than this effort. You might want to make a > > private branch for your work. > > I'm unclear how this work might hurt the OpenACC efforts, and in any case I > intend to be careful. I don't imagine there will be conflicting requirements > to source code changes along the way. In defense of the idea of working on > gomp4 branch, I expect that interleaving OpenACC and OpenMP work on a common > branch will cause less pain in case of inadvertent breakage than a merge > afterward. Jakub, since you suggested submitting for gomp-4_0-branch, what's > your recommendation here? My suggestion for this to be added to gomp-4_0-branch rather than e.g. gomp-4_1-branch or trunk directly is that even at the beginning it has some dependencies on the stuff that has not been merged into trunk yet, in particular the nvptx changes to libgomp that are on the branch and the code to link libgcc and/or libgomp statically into the nvptx offloaded chunks. Once those pieces are merged into trunk, obviously it could be developed on some other branch, but I'd hope none of the changes actually can be problematic to the OpenACC effort, OpenACC uses from the libgomp only a minimum files and that I bet is not going to change too much with the patches. As for merging plans, the OpenMP 4.1 standard is approaching its final form quickly, so I expect to merge gomp-4_1-branch to trunk around October 15th. It would be nice if the gomp-4_0-branch stuff (at least the parts Thomas/Nathan want to see in GCC 6) were in the process of being merged shortly after that (I know I'm behind with patch review and am very sorry for that, will try to find more time for that in the second half of October and early November). As for this NVPTX OpenMP 4.1 port, I'd say it really depends on how invasive it is to other parts of the compiler. Parts of it that can't destabilize OpenMP 4.1 host or XeonPhi/XeonPhi-emul nor OpenACC support can go even during stage3 (of course on a case by case basis). So I'd like to ask Thomas/Nathan if they are ok with this stuff being on the gomp-4_0-branch for now, once all the prerequisities it needs are on the trunk, it can go into its own branch. Jakub