Hi,

On 10/14/25 12:10, Tobias Burnus wrote:
But now to my testcase:
-----------------------
#include <math.h>
#include <stdio.h>

int main()
{
  double x;
//  #pragma omp target map(from: x)
  {
    x = 1.3547;
    #pragma omp parallel if(0)
      x = sin (x);
  }
  printf ("x = %f\n", x);
}
-----------------------

The propagation is not considered profitable enough:

-----

Evaluating opportunities for main._omp_fn.0/1.
 - considering value 1.3547000000000000152766688188421539962291717529296875e+0 for param #0 struct .omp_data_s.0 & restrict, offset: 0 (caller_count: 1)      good_cloning_opportunity_p (time: 1, size: 7, freq_sum: 1) -> evaluation: 142.86, threshold: 500      good_cloning_opportunity_p (time: 1, size: 7, freq_sum: 1) -> evaluation: 142.86, threshold: 500

-----

You only find this out if you toggle on the detailed ipa-cp dump.  If you add a loop large enough before entering the parallel region, the propagation does take place, so something like:

=====

#include <math.h>
#include <stdio.h>

int main()
{
  double x;
//  #pragma omp target map(from: x)
  for (int i = 0; i < 1000; i++)
  {
    x = 1.3547;
    #pragma omp parallel if(0)
        x = sin (x);
  }
  printf ("x = %f\n", x);
}

=====

In this case, the value will be propagated and gcc will remove the call to sin.  This is of course not ideal, as we would want this to work, but it's a larger ipa-cp issue.  Hopefully we will fix it in the future.

As mentioned, I think it would be useful to have it working
without -flto for the common cases (cf. previous discussion).
That should be a really easy fix, it should be enough to tweak that if statement I mentioned earlier.  I'll prepare the patch until this gets committed. :)
Handling assumptions - and ranges? - propagations could
be useful. For ranges, I was wondering about code like:
  double x = 1.0 + abs(y);
  ...
  if (x > 0.5)';
which in principle appears in real-world code, but I am not
sure to what extend the knowledge of, e.g., '>= 1.0' will
really help in real-world code. Likewise for assumptions,
albeit
  if (...)
  else
     __builtin_unreachable ();
is at least somewhat common in GCC's own code ...

* * *
Ok, I'll look into assumptions next.  Supporting them will also knock out a few things I want to get done anyway, like supporting arg modifications to support ipa-sra.
While optimizing the host fallback it fine, what we actually want to have is (also) an optimized device side. – It seems as if we may need to come up with some scheme which delays writing out the device-side to permit more optimizations - whether
just pass reordering or something else.

Actually, before writing out the device code the offload
table, actual cloning would be fine - this would then lead to
multiple host and device versions, but there is no reason
why that should be a problem as long there is a one-to-one
relation between host and device version.
While making it work with something like pass reordering sounds more reasonable, could we bring the device code into LTO to optimize it further?

* * *

And, finally, if 'parallel' is optimized, more complex offload
kernels will profit - even if IPA does not work with
GOMP_target_ext.

Tobias

Best regards,

Josef

Reply via email to