On 06/29/2016 07:11 AM, Thomas Schwinge wrote:

> Cesar, I have not yet fully digested this, but do I understand right that
> you're really fixing two issues here, that are related (OpenACC routines)
> but still can be addressed independently of each other?  Do I understand
> right that the first one, the "problems with acc routines [...]
> incorrectly permitting 'acc seq' loops to call gang, worker and vector
> routines" is just a Fortran front end patch?  If yes, please split that
> one out, so as to reduce the volume of remaining changes that remain to
> be discussed.

This patch addresses the following issues:

 1. Issues warnings when a non-acc routine function is called inside an
    OpenACC offloaded region.

 2. It corrects a bug what was allowing seq loops to call gang, worker
    and vector routines.

 3. It adds supports for acc routines in fortran modules (which I
    noticed was missing when I added 'acc routine seq' to acc_on_device
    in the fortran openacc include files).

I'll split these into separate patches.

> On Thu, 23 Jun 2016 09:05:38 -0700, Cesar Philippidis 
> <ce...@codesourcery.com> wrote:
>> On 06/17/2016 07:42 AM, Jakub Jelinek wrote:
>>> On Wed, Jun 15, 2016 at 08:12:15PM -0700, Cesar Philippidis wrote:
>>>> The second set of changes involves teaching the gimplifier to error when
>>>> it detects a function call to an non-acc routines inside an OpenACC
>>>> offloaded region.
> 
> As I understand, that's the same problem as has been discussed before
> (Ilya CCed), and has recently again been filed in
> <https://gcc.gnu.org/PR71499> "ICE in LTO1 when attempting NVPTX
> offloading (-fopenacc)", and <https://gcc.gnu.org/PR71535> "ICE in LTO1
> with -fopenmp offloading" (Alexander CCed).  Some earlier discussion
> threads include:
> <http://news.gmane.org/find-root.php?message_id=%3C20150109145702.GA45210%40msticlxl57.ims.intel.com%3E>,
> <http://news.gmane.org/find-root.php?message_id=%3C20150724152119.GA41292%40msticlxl57.ims.intel.com%3E>,
> <http://news.gmane.org/find-root.php?message_id=%3C56269C05.6030502%40acm.org%3E>.
> 
>>>> Actually, I relaxed non-acc routines by excluding
>>>> calls to builtin functions, including those prefixed with _gfortran_.
>>>> Nvptx does have a newlib c library, and it also has a subset of
>>>> libgfortran. Still, this solution is probably not optimal.
>>>
>>> I don't really like that, hardcoding prefixes or whatever is available
>>> (you have quite some subset of libc, libm etc. available too) in the
>>> compiler looks very hackish.  What is wrong with complaining during
>>> linking of the offloaded code?
> 
> ACK.  Jakub, do I understand you correctly, that you basically say that
> every function declaration that is in scope inside offloaded regions (for
> example, GCC builtin functions, or standard library functions declared in
> target compiler's header files) is permitted to be called in offloaded
> regions, and the offloading compiler will then either be able to resolve
> these (nvptx back end knows about trigonometric functions, for example,
> and a lot of functions are available in the nvptx libc), or otherwise
> error out during the offloading compilation (during linking), gracefully
> without terminating the target compilation (that "gracefully" bit is
> currently missing -- that's for another day).  That is, all such
> functions are implicitly callable as OpenACC "seq" functions (which means
> that they don't internally use gang/worker/vector parallelism).  In
> particular, all these functions do *not* need to be marked with an
> explicit "#pragma acc routine seq" directive.  (Functions internally
> using gang/worker/vector parallelism will need to be marked
> appropriately, using a "#pragma acc routine gang/worker/vector"
> directive.)  That's how I understand your comment above, and your earlier
> comments on this topic, and also is what I think should be done.

OK. I'll drop the warning changes from my patch set then unless you want
to keep it.

> A few random comments on the patch:
> 
>> --- a/gcc/fortran/gfortran.h
>> +++ b/gcc/fortran/gfortran.h
>> @@ -303,6 +303,15 @@ enum save_state
>>  { SAVE_NONE = 0, SAVE_EXPLICIT, SAVE_IMPLICIT
>>  };
>>  
>> +/* Flags to keep track of ACC routine states.  */
>> +enum oacc_function
>> +{ OACC_FUNCTION_NONE = 0,
>> +  OACC_FUNCTION_SEQ,
>> +  OACC_FUNCTION_GANG,
>> +  OACC_FUNCTION_WORKER,
>> +  OACC_FUNCTION_VECTOR
>> +};
> 
> What's the purpose of OACC_FUNCTION_NONE?  It's not used anywhere, as far
> as I can tell?

It's used by the fortran module code. It controls how parallelism gets
encoded in the .mod files.

>> --- a/gcc/fortran/openmp.c
>> +++ b/gcc/fortran/openmp.c
>> @@ -1664,21 +1664,31 @@ gfc_match_oacc_cache (void)
>>  
>>  /* Determine the loop level for a routine.   */
>>  
>> -static int
>> +static oacc_function
>>  gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>>  {
>>    int level = -1;
>> +  oacc_function ret = OACC_FUNCTION_SEQ;
>>  
>>    if (clauses)
>>      {
>>        unsigned mask = 0;
>>  
>>        if (clauses->gang)
>> -    level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +    {
>> +      level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +      ret = OACC_FUNCTION_GANG;
>> +    }
>>        if (clauses->worker)
>> -    level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +    {
>> +      level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +      ret = OACC_FUNCTION_WORKER;
>> +    }
>>        if (clauses->vector)
>> -    level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +    {
>> +      level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +      ret = OACC_FUNCTION_VECTOR;
>> +    }
>>        if (clauses->seq)
>>      level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>>  
> 
> I have not looked in detail, so maybe I'm misunderstanding what is being
> done here -- but how do "clauses->seq" and "level = GOMP_DIM_MAX" fit
> together?

Conceptually, if you take a look at the oacc_function attribute in a
tree dump, you'll see an array with three elements. Basically, each
element in that array represents a gang, worker or vector parallelism.
By definition, a gang loop permits a worker and vector loop to be nested
inside it. So, for a gang routine, the oacc_function attribute is
constructed such that it permits gang, worker and vector level
parallelism. Similarly, for a worker routine, the oacc_function
attribute has the worker and vector level parallelism 'bits' set.

With that in mind, setting seq to GOMP_DIM_MASK allows the loop creating
that oacc_function attribute to mask out any gang, worker and vector
parallelism.

>> @@ -1689,7 +1699,7 @@ gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>>    if (level < 0)
>>      level = GOMP_DIM_MAX;
>>  
>> -  return level;
>> +  return ret;
>>  }
> 
> Just from that last hunk, it seems that the assignment to "level" is a
> dead store?

I'll need to check this when I split out the patch.

>> +static tree
>> +add_attributes_to_decl (symbol_attribute sym_attr, tree list)
>> +{
>> +  unsigned id;
>> +  tree attr;
>> +
>> +  for (id = 0; id < EXT_ATTR_NUM; id++)
>> +    if (sym_attr.ext_attr & (1 << id))
>> +      {
>> +    attr = build_tree_list (
>> +             get_identifier (ext_attr_list[id].middle_end_name),
>> +                             NULL_TREE);
>> +    list = chainon (list, attr);
>> +      }
>> +
>> +  list = add_omp_offloading_attributes (sym_attr.omp_declare_target,
>> +                                    sym_attr.oacc_function, list);
>> +
>> +  return list;
>> +}
> 
> Something that I had noticed before, possibly related here: code in
> gcc/fortran/ does never call replace_oacc_fn_attrib, but the C and C++
> front ends do.  Is that function what you've re-implemented here?

Similar, but I broke this code out from another function to handle
BUILT_IN_EXPECT. But I can revert this change now, since BUILT_IN_EXPECT
will be treated as an implicit SEQ routine.

>> --- a/gcc/lto-cgraph.c
>> +++ b/gcc/lto-cgraph.c
>> @@ -1201,9 +1201,11 @@ input_overwrite_node (struct lto_file_decl_data 
>> *file_data,
>>                                   LDPR_NUM_KNOWN);
>>    node->instrumentation_clone = bp_unpack_value (bp, 1);
>>    node->split_part = bp_unpack_value (bp, 1);
>> -  gcc_assert (flag_ltrans
>> -          || (!node->in_other_partition
>> -              && !node->used_from_other_partition));
>> +
>> +  int success = flag_ltrans || (!node->in_other_partition
>> +                            && !node->used_from_other_partition);
>> +  if (!success)
>> +    error ("Missing %<%s%>", node->name ());
>>  }
>>  
>>  /* Return string alias is alias of.  */
>> @@ -1416,9 +1418,11 @@ input_varpool_node (struct lto_file_decl_data 
>> *file_data,
>>      node->set_section_for_node (section);
>>    node->resolution = streamer_read_enum (ib, ld_plugin_symbol_resolution,
>>                                              LDPR_NUM_KNOWN);
>> -  gcc_assert (flag_ltrans
>> -          || (!node->in_other_partition
>> -              && !node->used_from_other_partition));
>> +
>> +  int success = flag_ltrans || (!node->in_other_partition
>> +                            && !node->used_from_other_partition);
>> +  if (!success)
>> +    error ("Missing %<%s%>", node->name ());
>>  
>>    return node;
>>  }
> 
> That looks similar to what I remember from earlier, simiar patches, as
> referenced above.

It is. I never got around to pushing that patch very strongly because I
thought those link failures were legitimate compiler bugs.

>> @@ -19420,7 +19442,8 @@ oacc_loop_fixed_partitions (oacc_loop *loop, 
>> unsigned outer_mask)
>>      {
>>        unsigned outermost = this_mask & -this_mask;
>>  
>> -      if (outermost && outermost <= outer_mask)
>> +      if ((outermost && outermost <= outer_mask)
>> +      || (this_mask && (loop->parent->flags & OLF_SEQ)))
>>      {
>>        if (noisy)
>>          {
> 
>> --- a/gcc/testsuite/c-c++-common/goacc/routine-3.c
>> +++ b/gcc/testsuite/c-c++-common/goacc/routine-3.c
>> @@ -49,7 +49,7 @@ main ()
>>    int red = 0;
>>  #pragma acc parallel copy (red)
>>    {
>> -    /* Independent/seq loop tests.  */
>> +    /* Independent loop tests.  */
>>  #pragma acc loop reduction (+:red) // { dg-warning "insufficient 
>> partitioning" }
>>      for (int i = 0; i < 10; i++)
>>        red += gang ();
>> @@ -62,6 +62,19 @@ main ()
>>      for (int i = 0; i < 10; i++)
>>        red += vector ();
>>  
>> +    /* Seq loop tests.  */
>> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } 
>> */
>> +    for (int i = 0; i < 10; i++)
>> +      red += gang (); /* { dg-error "incorrectly nested" } */
>> +
>> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } 
>> */
>> +    for (int i = 0; i < 10; i++)
>> +      red += worker (); /* { dg-error "incorrectly nested" } */
>> +
>> +#pragma acc loop seq reduction (+:red)  /* { dg-message "containing loop" } 
>> */
>> +    for (int i = 0; i < 10; i++)
>> +      red += vector (); /* { dg-error "incorrectly nested" } */
>> +    
>>      /* Gang routine tests.  */
>>  #pragma acc loop gang reduction (+:red)  /* { dg-message "containing loop" 
>> } */
>>      for (int i = 0; i < 10; i++)
> 
> Do these test case changes actually relate to any of the compiler changes
> discussed above?  Maybe to the oacc_loop_fixed_partitions cited just
> above?  Is that a separate issue to fix?  Eh, or is that actually the fix
> for your first issue, the "problems with acc routines [...] incorrectly
> permitting 'acc seq' loops to call gang, worker and vector routines"?

This is issue 2, and I'll break it out into a separate patch.

>> --- 
>> a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
>> +++ 
>> b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels-ipa-pta.c
>> @@ -1,4 +1,4 @@
>>  /* { dg-do run { target lto } } */
>> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
>> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max 
>> -fno-exceptions" } */
>>  
>>  #include "data-clauses-kernels.c"
> 
>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-kernels.c
>> @@ -1,2 +1,4 @@
>> +/* { dg-additional-options "-fno-exceptions" }  */
>> +
>>  #define CONSTRUCT kernels
>>  #include "data-clauses.h"
> 
>> --- 
>> a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
>> +++ 
>> b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel-ipa-pta.c
>> @@ -1,4 +1,4 @@
>>  /* { dg-do run { target lto } } */
>> -/* { dg-additional-options "-fipa-pta -flto -flto-partition=max" } */
>> +/* { dg-additional-options "-fipa-pta -flto -flto-partition=max 
>> -fno-exceptions" } */
>>  
>>  #include "data-clauses-parallel.c"
> 
>> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
>> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-clauses-parallel.c
>> @@ -1,2 +1,4 @@
>> +/* { dg-additional-options "-fno-exceptions" }  */
>> +
>>  #define CONSTRUCT parallel
>>  #include "data-clauses.h"
> 
> Hmm?

I'm not sure what happened here either. Maybe adding the 'acc routine'
directive to acc_on_device is preventing that function from expanding to
its builtin function counterpart, which caused gcc to generate exception
code?

Cesar

Reply via email to