> On Mon, Dec 6, 2021 at 4:03 PM Richard Biener
> <richard.guent...@gmail.com> wrote:
> >
> > On Mon, Dec 6, 2021 at 11:10 AM Richard Sandiford
> > <richard.sandif...@arm.com> wrote:
> > >
> > > Richard Biener <richard.guent...@gmail.com> writes:
> > > > On Sun, Dec 5, 2021 at 10:59 PM Richard Sandiford via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > >>
> > > >> When compiling an optabs.ii at -O2 with a release-checking build,
> > > >> we spent a lot of time in call_may_clobber_ref_p.  One of the
> > > >> first things the function tries is the get_modref_function_summary
> > > >> approach, but that also seems to be the most expensive check.

In general it should not be that expensive since the trees are generally
small (despite the 3 nested loops making them look dangerously big) and
most of time we resort to simple pointer/array lookups.

>From lcov stats https://splichal.eu/lcov/gcc/tree-ssa-alias.c.gcov.html
(which are quite useful when trying to understand typical behaviour of
the alias oracle)

In call_may_clobber_ref_p we get

    2980                 :  256141790 :   callee = gimple_call_fndecl (call);
    2981                 :            : 
    2982                 :  256141790 :   if (callee != NULL_TREE && 
!ref->volatile_p)
    2983                 :            :     {
    2984                 :  240446150 :       struct cgraph_node *node = 
cgraph_node::get (callee);
this is direct pointer from node
    2985                 :  240446150 :       if (node)
    2986                 :            :         {
    2987                 :  240234208 :           modref_summary *summary = 
get_modref_function_summary (node);
This is fast summary so it is array lookup
    2988                 :  240234208 :           if (summary)
    2989                 :            :             {
    2990                 :   19732392 :               if (!modref_may_conflict 
(call, summary->stores, ref, tbaa_p)

And we get here in 8% of invocations

And in modref_may_conflict we have:


   21147836 : modref_may_conflict (const gcall *stmt,
    2552                 :            :                      modref_tree 
<alias_set_type> *tt, ao_ref *ref, bool tbaa_p)
    2553                 :            : {
    2554                 :   21147836 :   alias_set_type base_set, ref_set;
    2555                 :            : 
    2556                 :   21147836 :   if (tt->every_base)
    2557                 :            :     return true;
    2558                 :            : 
    2559                 :    3407876 :   if (!dbg_cnt (ipa_mod_ref))
    2560                 :            :     return true;
    2561                 :            : 
    2562                 :    3407876 :   base_set = ao_ref_base_alias_set 
(ref);
    2563                 :            : 
    2564                 :    3407876 :   ref_set = ao_ref_alias_set (ref);
All direct lookups in most cases
    2565                 :            : 
    2566                 :    3407876 :   int num_tests = 0, max_tests = 
param_modref_max_tests;
    2567                 :   14479077 :   for (auto base_node : tt->bases)
    2568                 :            :     {
    2569                 :    6171739 :       if (tbaa_p && 
flag_strict_aliasing)
At average there are 2 bases travelled by the loop.
    2570                 :            :         {
    2571                 :    5253690 :           if (num_tests >= max_tests)
    2572                 :            :             return true;
    2573                 :    5253690 :           alias_stats.modref_tests++;
    2574                 :    5253690 :           if (!alias_sets_conflict_p 
(base_set, base_node->base))
alias set checks cheecks are also reasonably cheap and 50% of time avoids 
walking rest of the tree
    2575                 :    2238398 :             continue;
    2576                 :    3015292 :           num_tests++;
    2577                 :            :         }
    2578                 :            : 
    2579                 :    3933341 :       if (base_node->every_ref)

We hit the loop only 0.15 times per invocation of the function

    2580                 :            :         return true;
    2581                 :            : 
    2582                 :   14943624 :       for (auto ref_node : 
base_node->refs)
    2583                 :            :         {
    2584                 :            :           /* Do not repeat same test as 
before.  */
    2585                 :    4381325 :           if ((ref_set != base_set || 
base_node->base != ref_node->ref)

and usually visit only bit more htan one refs
    2586                 :    2548127 :               && tbaa_p && 
flag_strict_aliasing)
    2587                 :            :             {
    2588                 :    1840866 :               if (num_tests >= 
max_tests)
    2589                 :            :                 return true;
    2590                 :    1809594 :               
alias_stats.modref_tests++;
    2591                 :    1809594 :               if 
(!alias_sets_conflict_p (ref_set, ref_node->ref))
    2592                 :     314793 :                 continue;
    2593                 :    1494801 :               num_tests++;
    2594                 :            :             }
    2595                 :            : 
    2596                 :    4035260 :           if (ref_node->every_access)
    2597                 :            :             return true;
    2598                 :            : 
    2599                 :            :           /* TBAA checks did not 
disambiguate, try individual accesses.  */
    2600                 :   13186541 :           for (auto access_node : 
ref_node->accesses)

    2601                 :            :             {
    2602                 :    3770816 :               if (num_tests >= 
max_tests)

Again this loop usually executes only once
    2603                 :     463332 :                 return true;
    2604                 :            : 
    2605                 :    3770816 :               tree arg = 
access_node.get_call_arg (stmt);

here more expensive work starts (we will do ao ref oracle matching
but only in 0.1 times per invocation.

So in general it should not be too bad as I am also watching number of
querries produced by this loop in the stats.

However moving PTA tests ahead is probably a good idea (I guess they
resort to bitmap operaitons that also do some looping).
> > > 3786              if (stmt_may_clobber_ref_p_1 (def_stmt, ref, tbaa_p))
> > > (gdb) call debug (def_stmt)
> > > __builtin_memset (p_3(D), 0, 28);
> > > (gdb) call debug (ref.base)
> > > *p_3(D)
> > > (gdb) call pt_solution_includes (&((gcall *) def_stmt)->call_clobbered, 
> > > ref.base)
> > > $3 = false
> >
> > huh, well - *p_3(D) isn't a DECL and pt_solution_includes just works for 
> > decls.
> > In fact it should eventually ICE even ;)  That said, nothing should
> > really iniitalize
> > the call_clobber set besides pt_solution_reset at GIMPLE stmt allocation 
> > time
> > or IPA PTA when enabled.  pt_solutuon_reset sets ->anything to true so the
> > function should return true ...

Even before my patch we computed call clobbered. handle_rhs_call sets
call clobbers for non-readonly nonescape args via get_call_clobber_vi
and later in compute_points_to_sets this info is copied into
gimple_call_clobbered_set:

          pt = gimple_call_clobber_set (stmt);                                  
          if (gimple_call_flags (stmt) & (ECF_CONST|ECF_PURE|ECF_NOVOPS))       
            memset (pt, 0, sizeof (struct pt_solution));                        
          else if ((vi = lookup_call_clobber_vi (stmt)) != NULL)                
            {                                                                   
              *pt = find_what_var_points_to (cfun->decl, vi);                   
              /* Escaped (and thus nonlocal) variables are always               
                 implicitly clobbered by calls.  */                             
              /* ???  ESCAPED can be empty even though NONLOCAL                 
                 always escaped.  */                                            
              pt->nonlocal = 1;                                                 
              pt->escaped = 1;                                                  
            }                                                                   
          else                                                                  
            {                                                                   
              /* If there is nothing special about this call then               
                 we have made everything that is used also escape.  */          
              *pt = cfun->gimple_df->escaped;                                   
              pt->nonlocal = 1;                                                 
            }                                                                   

I think you are hitting the following:

static bool                                                                     
find_func_aliases_for_builtin_call (struct function *fn, gcall *t)              
{                                                                               
  tree fndecl = gimple_call_fndecl (t);                                         
  auto_vec<ce_s, 2> lhsc;                                                       
  auto_vec<ce_s, 4> rhsc;                                                       
  varinfo_t fi;                                                                 
                                                                                
  if (gimple_call_builtin_p (t, BUILT_IN_NORMAL))                               
    /* ???  All builtins that are handled here need to be handled               
       in the alias-oracle query functions explicitly!  */                      
    switch (DECL_FUNCTION_CODE (fndecl))                                        
      {                                                                         
      /* All the following functions return a pointer to the same object        
         as their first argument points to.  The functions do not add           
         to the ESCAPED solution.  The functions make the first argument        
         pointed to memory point to what the second argument pointed to         
         memory points to.  */                                                  
      case BUILT_IN_STRCPY:                                                     

Notice the ??? comment. The code does not set clobbers here because it
assumes that tree-ssa-alias will do the right thing.
So one may make builtins handling first, PTA next and only if both say
"may alias" continue.  Other option is to extend the code here to add
propert clobber/use PTA to make things more regular.  I would be happy
to help with that.

Honza
> >
> > It looks like Honza added computation of these sets at some point, also 
> > maybe
> > forgetting to set some bits here.
> >
> > I prefer to investigate / fix this before refactoring the uses in this way. 
> >  Can
> > you file a bugreport please?  I can reproduce the memcpy FAIL with
> > just moving check_fnspec down.
> 
> So the issue seems that for
> # .MEM_22 = VDEF <.MEM_19>
> memcpy (&dst, &src, 1024);
> 
> determine_global_memory_access determines the stmt doesn't write
> to global memory (well, yes!), but it fails to add the actual arguments
> to the set of clobbered vars.
> 
> Honza, can you please look into this?  Seems to be caused
> by g:008e7397dad971c03c08fc1b0a4a98fddccaaed8
> 
> Richard.
> 
> > Thanks,
> > Richard.
> >
> > > Setting ASSERTS to 0 gives:
> > >
> > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2  execution test
> > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fno-use-linker-plugin 
> > > -flto-partition=none  execution test
> > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O2 -flto -fuse-linker-plugin 
> > > -fno-fat-lto-objects  execution test
> > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -fomit-frame-pointer 
> > > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> > > FAIL: gcc.c-torture/execute/memcpy-1.c   -O3 -g  execution test
> > > FAIL: gcc.c-torture/execute/memcpy-1.c   -Os  execution test
> > > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -fomit-frame-pointer 
> > > -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> > > FAIL: gcc.c-torture/execute/memset-3.c   -O3 -g  execution test
> > >
> > > # of expected passes            23138
> > > # of unexpected failures        8
> > > # of unsupported tests          24
> > >
> > > (same for -m32).
> > >
> > > Originally I saw it as a bootstrap failure building stage 2
> > > build-side libcpp, due to a bogus uninitialised warning.
> > >
> > > But alongside the memset and free examples, there's:
> > >
> > >       /* All the following functions do not return pointers, do not
> > >          modify the points-to sets of memory reachable from their
> > >          arguments and do not add to the ESCAPED solution.  */
> > >       case BUILT_IN_SINCOS:
> > >       case BUILT_IN_SINCOSF:
> > >       ...
> > >         return true;
> > >
> > > (always an empty set), whereas check_fnspec takes errno into account.
> > >
> > > > I wonder if, at this point, we should split the patch up to do the
> > > > trivial move of the ao_ref_base dependent and volatile checks
> > > > and leave this more complicated detail for a second patch?
> > >
> > > Yeah, that'd work, but in practice it was the PT part that was the
> > > high-value part.  IIRC the early base checks caught less than 1% of
> > > cases (I can rerun tonight to get exact numbers).
> > >
> > > Thanks,
> > > Richard
> > >

Reply via email to