https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122545

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Manuel López-Ibáñez from comment #2)
> (In reply to Richard Biener from comment #1)
> > The issue is that quite a lot of "semantic" changing flags are not reflected
> > into the IL, so a "fininte-math-only" FP add looks the same as a not
> > "finite-math-only" one.  We can track those flags on a per function level
> > only, and granularity there is not going to improve.  Some flags might be
> > reflected onto the operation, but the exact way is difficult and it will
> > have a large impact on the whole code base.
> > 
> > In principle one can inline into "more conservative" callers and we do that
> > (see can_inline_edge_by_limits_p doing check_{match,maybe_up,maybe_down}),
> > but IPA inlining happens quite early and so there's no chance to get
> > vectorized code inlined, instead you'd get the non-vectorized IL inlined
> > and then not vectorized (because now in more conservative context).
> 
> But this is not what seems to be happening here.
> 
> gcc -O3 -march=x86-64-v3 -fopt-info-vec-optimized-missed
> -D_attr_finite_math= -U_attr_finite_math_helper
> 
> The inner loop is vectorized. epsilon_helper_() is NOT completely inlined.

Indeed.  I'd have expected inlining here.  -Winline shows

t.c:21:1: warning: inlining failed in call to ‘epsilon_helper_.constprop’:
optimization level attribute mismatch [-Winline]
   21 | epsilon_helper_(bool do_mult, const enum objs_agree_t agree,
      | ^~~~~~~~~~~~~~~
t.c:116:12: note: called from here
  116 |     return epsilon_helper_(/* do_mult=*/false, AGREE_MAXIMISE,
/*minmax=*/NULL, dim, points_a, size_a, points_b, size_b);


> The loop is vectorized but the inlining decisions have changed.
> 
> Also, these flags only affect FP operations, so the IL could either encode
> different FP ops for finite-math/signed-zeros or it could add extra flags
> for those operations that encode the behavior, so MAX_EXPR(a,b,
> FP_FINITE_MATH | FP_SIGNED_ZEROS) (for those operations where the different
> matters). In fact, that seems to be happening already to some extent (the
> call to fmax does not survive into SSA with -D_attr_finite_math=
> -U_attr_finite_math_helper).

Sure, there are many possibilites - but all code handling MAX_EXPR would
have to care, either for correctness or to not lose optimization.

Reply via email to