Hi Jakub! On Fri, 22 Jan 2016 09:36:25 +0100, Jakub Jelinek <ja...@redhat.com> wrote: > On Fri, Jan 22, 2016 at 08:40:26AM +0100, Thomas Schwinge wrote: > > On Thu, 21 Jan 2016 22:54:26 +0100, I wrote: > > > On Mon, 18 Jan 2016 18:26:49 +0100, Tom de Vries <tom_devr...@mentor.com> > > > wrote: > > > > [...] [OpenACC] kernels region [...] > > > > that parloops does not manage to parallelize: > > > > > Telling from real-world code that we've been having a look at, when the > > > above situation happens, we're -- in the vast majority of all cases -- in > > > a situation where we generally want to avoid offloading (unless > > > explicitly requested), "to avoid data copy penalty" as well as typically > > > much slower single-threaded execution on the GPU. Obviously, that will > > > have to be revisited as parloops (or any other mechanism in GCC) is able > > > to better understand/use the parallelism in OpenACC kernels constructs. > > > > > > So, building upon Tom's patch, I have implemented an "avoid offloading" > > > flag given the presence of one un-parallelized OpenACC kernels construct. > > > This is currently only enabled for OpenACC kernels constructs, in > > > combination with nvptx offloading, but I think the general scheme will be > > > useful also for other constructs as well as other (non-shared memory) > > > offloading targets. > > > > > Committed to gomp-4_0-branch in r232709: > > > > > > commit 41a76d233e714fd7b79dc1f40823f607c38306ba > > > Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4> > > > Date: Thu Jan 21 21:52:50 2016 +0000 > > > > > > Un-parallelized OpenACC kernels constructs with nvptx offloading: > > > "avoid offloading" > > > > Thought I'd check before porting it over -- will such a patch also be > > accepted for trunk? > > I think it is a bad idea to go against what the user wrote. Warning that > some code might not be efficient? Perhaps (if properly guarded with some > warning option one can turn off, either on a per-source file or using > pragmas even more fine grained). But by default not offloading? That is > just wrong.
Well, let's argue the opposite way round: a user annotated the source code with directives to help the compiler identify parallelization/offloading opportunities. These directives are just descriptive hints however; (obeying program semantics, of course) the compiler is free to ignore them, or just pay attention to some of them. Suppose the compiler didn't find any parallelization opportunities, but it knows that compared to host-fallback execution, offloading will be slower for single-threaded code (data copy penalty, slower GPU clock speed), so it only makes sense to not offload the code in such cases. This is, quite possibly, semantically different from OpenMP directives, where with OpenMP typically the compiler always exactly does what the user prescribes with directives. (But even there, you can automatically apply SIMD parallelism, for example. You just have to make sure that it doesn't interfer with the program semantics, basically that the user "won't notice".) Does that clarify? Grüße Thomas