[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

rguenther at suse dot de Thu, 05 Apr 2018 02:05:54 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180


--- Comment #9 from rguenther at suse dot de <rguenther at suse dot de> ---
On Thu, 5 Apr 2018, jakub at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85180
> 
> --- Comment #8 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
> If find_base_term always returned whatever first returned non-NULL up to the
> ultimate caller, then I think the above would work fine.  Sadly, that is the
> case only in most of the spots, not all of them.
> The PLUS/MINUS case is the major exception (and *_EXTEND too, but that doesn't
> matter):
>         rtx base = find_base_term (tmp1);
>         if (base != NULL_RTX
>             && ((REG_P (tmp1) && REG_POINTER (tmp1))
>                  || known_base_value_p (base)))
>           return base;
>         base = find_base_term (tmp2);
>         if (base != NULL_RTX
>             && ((REG_P (tmp2) && REG_POINTER (tmp2))
>                  || known_base_value_p (base)))
>           return base;
> If tmp1 or tmp2 aren't REGs with REG_POINTER (the REG with REG_POINTER case
> isn't that interesting, because the find_base_term recursion is one level only
> in that case and no VALUEs are involved), then we only return whatever has 
> been
> returned if known_base_value_p (base).  So other bases (that is group (1),
> incoming arguments, right?) might be returned before your patch and not after
> it, e.g. if
> we at toplevel call find_base_term on a VALUE that has (plus (value 123) 
> (value
> 345)) and (value 456) locs, value 123 has (minus (value 456) (value 345)),
> value 345 doesn't have a base term and value 456 is incoming argument.  When
> recursing on the plus and minus, we see that find_base_term on 456 returned
> non-NULL base that is !known_base_value_p (base) and return NULL, then we walk
> value 456 again and at this point without your patch we would return the
> incoming ADDRESS, but with the patch don't, because we've already walked it.
> 
> Perhaps those cases are sufficiently rare that we don't care, still, it would
> be nice to gather at least some statistics on this on multiple targets (say
> x86_64 and powerpc64) bootstraps/regtests, tweak your patch so that the 2
> argument find_base_term can have the second argument NULL and in that case it
> behaves the old way and call it twice, once with NULL argument and then with
> non-NULL one and compare the result, gather statistics on how many toplevel
> find_base_term calls there were and how many out of them resulted in different
> result.

So we could conservatively address this by

  unsigned saved_stackpos = visited_vals.length ();
  rtx base = find_base_term (tmp1, visited_vals);
  if (base != NULL_RTX
            && ((REG_P (tmp1) && REG_POINTER (tmp1))
                 || known_base_value_p (base)))
          return base;
  if (base != NULL_RTX)
    unwind-visited_vals-to-saved-stackpos;

and the same for the tmp2 case.  But I'm not sure whether that's a good
idea since it looks like we may end up re-introducing the exponential
complexity if we never return any of the found bases.

Fixing this case with keeping the current behavior would instead ask
for caching the outcome of find_base_term on VALUEs where we then
can remove the val->locs frobbing.

The question is whether that is worth it (as you say) and what
the cost of doing so is.  It looks like we may be able to
change cselib_val in a way to make a direct-mapped cache
possible without too much overhead?  find_base_term is quite
performance sensitive IIRC.  OTOH cselib_val might be size-sensitive...
Using a bit in the RTX and re-using cselib_val->val_rtx and keeping
an unwinding stack for that might be possible if the more trivial
implementation with a hash-map is too expensive.

[Bug rtl-optimization/85180] Infinite loop in RTL DSE optimizer

Reply via email to