Hi Andrew,

Thanks for the detailed explanation and for pointing me to PR123113 and
the related patch series.

I checked the scope of PR123113 more carefully. My understanding is that
those patches mainly relax existing restrictions in phiopt / cselim /
factor_out_conditional_operation around merge blocks with more than two
predecessors. That certainly seems relevant to part of the problem. 
(Inbox<https://inbox.sourceware.org/gcc-bugs/?t=20251215070155&utm_source=chatgpt.com>)

However, I do not think those patches fully cover the motivating case I
had in mind.

The vectorized example you showed has the form

sqrt(expr1[i])
sqrt(expr2[i])
sqrt(expr3[i])


where expr1/expr2/expr3 are already available values, so ifcvt can form

select(select(...))
sqrt(selected_value)


quite naturally.

In my motivating kernel, the operands of sqrt are themselves computed
inside the individual control-flow paths. In simplified form, the three
arms look more like

sqrt(complex_expr1)
sqrt(complex_expr2)
sqrt(complex_expr3)


where the third arm also contains extra intermediate computations before
the final squared-distance expression is formed.

Here is a reduced Compiler Explorer example that still does not get
converted into the desired “blend operands first, then perform one sqrt”
shape:

https://godbolt.org/z/savsxzaWz


So the transformation I am interested in is still closer to:

PHI <sqrt(a), sqrt(b), sqrt(c)>
    =>
sqrt(PHI <a, b, c>)


in cases where a/b/c may be branch-local computed expressions, not just
already-materialized values.

Put differently:

PR123113 improves factoring around multi-pred PHIs,
but does GCC still lack a stronger common pure-operation
factoring/sinking transformation for branch-local computed operands?


I would be interested in your thoughts on whether this stronger form is
considered worthwhile in GCC, and if so, whether you would expect it to
belong in phiopt, ifcvt, or some other middle-end transformation.

Thanks again for the clarification.

Best regards,
Guoce Feng

---- Replied Message ----
From    Andrew 
Pinski<[email protected]><mailto:[email protected]>
Date    5/18/2026 16:49
To      Guoce Feng<[email protected]><mailto:[email protected]>
Cc      [email protected]<[email protected]>,
<mailto:[email protected]>Richard Biener<[email protected]>,
<mailto:[email protected]>Andrew Pinski<[email protected]>,
<mailto:[email protected]>Jeff 
Law<[email protected]><mailto:[email protected]>
Subject Re: [RFC] Factoring common pure operations across PHIs / branches to 
reduce duplicated sqrt calls

On Mon, May 18, 2026 at 1:16 AM Guoce Feng via Gcc <[email protected]> wrote:

 Hi,

 I am investigating an optimization opportunity in GCC related to
 factoring common pure operations out of control-flow joins.

 Consider code of this general form:

 if (c1 <= 0.0)
   distance = sqrt(expr1);
 else if (c2 <= c1)
   distance = sqrt(expr2);
 else
   distance = sqrt(expr3);


 After control-flow simplification / if-conversion, this can effectively
 lead to a form resembling:

 distance = select(cond1,
                   sqrt(expr1),
                   select(cond2, sqrt(expr2), sqrt(expr3)));


 For vectorized code, this may become "three vector sqrt operations plus
 blends", while a more profitable form would be:

 merged_expr = select(cond1,
                      expr1,
                      select(cond2, expr2, expr3));
 distance = sqrt(merged_expr);


 That is, transform a conceptual pattern like

 phi(sqrt(a), sqrt(b), sqrt(c))


 into

 sqrt(phi(a, b, c))


 when the operation is safe to factor out and doing so is profitable.

 I noticed that LLVM has a related transformation in SimplifyCFG under
 the sink-common-insts machinery. In GCC, I am trying to understand what
 the most appropriate place would be for a similar optimization.

 Two possible implementation directions seem plausible:

   1.  Extend the existing PHI factoring logic in
 tree-ssa-phiopt.cc, especially around
 factor_out_conditional_operation, to also handle suitable
 side-effect-free builtins / calls such as sqrt under the
 appropriate semantic restrictions.

So part of the problem with the above case at -Ofast is really
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123113 .
I have a patch set which is mostly approved but I need to double check
on the order of basic-blocks being looked at.
The patches are:
https://inbox.sourceware.org/gcc-patches/[email protected]/
https://inbox.sourceware.org/gcc-patches/[email protected]/
https://patchwork.sourceware.org/project/gcc/patch/[email protected]/

In that order. (I can't seen to find the 3rd patch on inbox for some reason).
With those patches we get:
```
  <bb 2> [local count: 1073741824]:
  if (c1_2(D) <= 0.0)
    goto <bb 3>; [41.00%]
  else
    goto <bb 4>; [59.00%]

  <bb 3> [local count: 440234144]:
  distance_7 = __builtin_sqrt (expr1_6(D)); [tail call]
  goto <bb 7>; [100.00%]

  <bb 4> [local count: 633507680]:
  if (c1_2(D) >= c2_3(D))
    goto <bb 5>; [50.00%]
  else
    goto <bb 6>; [50.00%]

  <bb 5> [local count: 316753840]:

  <bb 6> [local count: 633507680]:
  # _8 = PHI <expr3_4(D)(4), expr2_5(D)(5)>
  distance_10 = __builtin_sqrt (_8); [tail call]

  <bb 7> [local count: 1073741824]:
  # distance_1 = PHI <distance_10(6), distance_7(3)>
  return distance_1;
```

Extending factor_* to handle the above could be done but I am not sure
phiopt is the correct place.
Note there is code in the ifcvt that does a similar thing as factor_*
in phiopt which then does handle the rest of it and for:
```
double f(double *expr1, double *expr2, double *expr3, double *c1, double *c2)
{
double t = 0.0;
for(int i =0 ;i < 1024; i++)
{
  double distance = 0.0;
if (c1[i] <= 0.0)
  distance = __builtin_sqrt(expr1[i]);
else if (c2[i] <= c1[i])
  distance = __builtin_sqrt(expr2[i]);
else
  distance = __builtin_sqrt(expr3[i]);
t+=distance;
}
  return t;
}

```

With -mavx512f (since I was too lazy to figure out what expr* was and
too lazy to test non-x86_64 [since that is what I have built]) we get:
```
  _7 = _34 ? _14 : _12;
  _63 = _42 ? _6 : _7;
  distance_17 = __builtin_sqrt (_63);
  t_25 = distance_17 + t_29;
```
In ifcvt.

Which I assume is exactly what you were expecting.



   2.  Add a more general "common instruction sinking" transformation to
 tree-ssa-sink.cc, analogous in spirit to the existing common-store
 sinking support, but for a restricted class of pure scalar
 computations.

 My current intuition is that the PHIOPT direction may be a better fit
 for this specific pattern, because the optimization is fundamentally
 about factoring a common operation across PHI/merge structure rather
 than ordinary code sinking. However, I would appreciate guidance from
 the GCC community on the preferred design.

 I am especially interested in feedback on:

   *   Whether tree-ssa-phiopt.cc is indeed the right place for this form of
 transformation.

   *   Whether handling builtins such as sqrt is acceptable under the
 relevant floating-point / errno / exception semantics.

   *   Whether this should be restricted to cases where the operation is
 known to be side-effect-free and safe to move.

   *   Whether profitability should be guided primarily by code size,
 vectorization opportunities, or target-independent cost heuristics.

   *   Whether there is existing work or a PR in this area that I should
 build upon.

See above.


 A motivating reduced example comes from a geometric distance kernel in
 which three branch-local sqrt computations could ideally be turned into
 one sqrt after value merging, improving the shape of the vectorized code.
See above.

Thanks,
Andrea


 If this direction seems reasonable, I am interested in preparing a
 prototype patch and test cases.

 Thanks,
 Guoce Feng

Reply via email to