Re: [PATCH v4] ipa, cgraph: Enable constant propagation to OpenMP kernels

Josef Melcr Sat, 18 Oct 2025 03:20:10 -0700

Hi,

On 10/14/25 12:10, Tobias Burnus wrote:

But now to my testcase:
-----------------------
#include <math.h>
#include <stdio.h>


int main()
{
  double x;
//  #pragma omp target map(from: x)
  {
    x = 1.3547;
    #pragma omp parallel if(0)
      x = sin (x);
  }
  printf ("x = %f\n", x);
}
-----------------------


The propagation is not considered profitable enough:

-----

Evaluating opportunities for main._omp_fn.0/1.

- considering value1.3547000000000000152766688188421539962291717529296875e+0 for param #0struct .omp_data_s.0 & restrict, offset: 0 (caller_count: 1) good_cloning_opportunity_p (time: 1, size: 7, freq_sum: 1) ->evaluation: 142.86, threshold: 500 good_cloning_opportunity_p (time: 1, size: 7, freq_sum: 1) ->evaluation: 142.86, threshold: 500


-----

You only find this out if you toggle on the detailed ipa-cp dump. Ifyou add a loop large enough before entering the parallel region, thepropagation does take place, so something like:


=====

#include <math.h>
#include <stdio.h>

int main()
{
  double x;
//  #pragma omp target map(from: x)
  for (int i = 0; i < 1000; i++)
  {
    x = 1.3547;
    #pragma omp parallel if(0)
        x = sin (x);
  }
  printf ("x = %f\n", x);
}

=====

In this case, the value will be propagated and gcc will remove the callto sin. This is of course not ideal, as we would want this to work, butit's a larger ipa-cp issue. Hopefully we will fix it in the future.

As mentioned, I think it would be useful to have it working
without -flto for the common cases (cf. previous discussion).

That should be a really easy fix, it should be enough to tweak that ifstatement I mentioned earlier. I'll prepare the patch until this getscommitted. :)

Handling assumptions - and ranges? - propagations could
be useful. For ranges, I was wondering about code like:
  double x = 1.0 + abs(y);
  ...
  if (x > 0.5)';
which in principle appears in real-world code, but I am not
sure to what extend the knowledge of, e.g., '>= 1.0' will
really help in real-world code. Likewise for assumptions,
albeit
  if (...)
  else
     __builtin_unreachable ();
is at least somewhat common in GCC's own code ...

* * *

Ok, I'll look into assumptions next. Supporting them will also knockout a few things I want to get done anyway, like supporting argmodifications to support ipa-sra.

While optimizing the host fallback it fine, what we actually want tohave is (also)an optimized device side. – It seems as if we may need to come up withsome schemewhich delays writing out the device-side to permit more optimizations- whether
just pass reordering or something else.

Actually, before writing out the device code the offload
table, actual cloning would be fine - this would then lead to
multiple host and device versions, but there is no reason
why that should be a problem as long there is a one-to-one
relation between host and device version.

While making it work with something like pass reordering sounds morereasonable, could we bring the device code into LTO to optimize it further?


* * *

And, finally, if 'parallel' is optimized, more complex offload
kernels will profit - even if IPA does not work with
GOMP_target_ext.

Tobias

Best regards,

Josef

Re: [PATCH v4] ipa, cgraph: Enable constant propagation to OpenMP kernels

Reply via email to