On 2/16/23 08:55, Richard Biener wrote:
On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:

This patch implements the suggestion that we have an alternative
ssa-cache which does not zero memory, and instead uses a bitmap to track
whether a value is currently set or not.  It roughly mimics what
path_range_query was doing internally.

For sparsely used cases, expecially in large programs, this is more
efficient.  I changed path_range_query to use this, and removed it old
bitmap (and a hack or two around PHI calculations), and also utilized
this is the assume_query class.

Performance wise, the patch doesn't affect VRP (since that still uses
the original version).  Switching to the lazy version caused a slowdown
of 2.5% across VRP.

There was a noticeable improvement elsewhere.,  across 230 GCC source
files, threading ran over 12% faster!.  Overall compilation improved by
0.3%  Not sure it makes much difference in compiler.i, but it shouldn't
hurt.

bootstraps on x86_64-pc-linux-gnu with no regressions.   OK for trunk?
or do you want to wait for the next release...

I see

@@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)

        Value_Range r (TREE_TYPE (name));
        if (range_defined_in_block (r, name, bb))
-       {
-         unsigned v = SSA_NAME_VERSION (name);
-         set_cache (r, name);
-         bitmap_set_bit (phi_set, v);
-         // Pretend we don't have a cache entry for this name until
-         // we're done with all PHIs.
-         bitmap_clear_bit (m_has_cache_entry, v);
-       }
+       m_cache.set_global_range (name, r);
      }
-  bitmap_ior_into (m_has_cache_entry, phi_set);
  }

  // Return TRUE if relations may be invalidated after crossing edge E.

which I think is not correct - if we have

  # _1 = PHI <..., _2>
  # _2 = PHI <..., _1>

then their effects are supposed to be executed in parallel, that is,
both PHI argument _2 and _1 are supposed to see the "old" version.
The previous code tried to make sure the range of the new _1 doesn't
get seen when processing the argument _1 in the definition of _2.

Yes, the effects should appear in parallel, but ssa_range_in_phi() which is the only thing range_defined_in_block does for PHIs, is guaranteed to not do any additional cache lookups. The comment there should be adjusted to make this clear:

// Since PHIs are calculated in parallel at the beginning of the
// block, we must be careful to never save anything to the cache here.
// It is the caller's responsibility to adjust the cache.  Also,
// calculating the PHI's range must not trigger additional lookups.

We should instead say:

"we must be careful to never set or access the cache here"...

This was the original intent, but a subtle access to the cache crept in here:

      // Try to fold the phi exclusively with global or cached values.
      // This will get things like PHI <5(99), 6(88)>.  We do this by
      // calling range_of_expr with no context.
      unsigned nargs = gimple_phi_num_args (phi);
      Value_Range arg_range (TREE_TYPE (name));
      r.set_undefined ();
      for (size_t i = 0; i < nargs; ++i)
        {
          tree arg = gimple_phi_arg_def (phi, i);
          if (range_of_expr (arg_range, arg, /*stmt=*/NULL))

This range_of_expr call will indeed access the cache incorrectly, but Andrew fixed that here:

@@ -264,7 +236,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi *phi)
       for (size_t i = 0; i < nargs; ++i)
        {
          tree arg = gimple_phi_arg_def (phi, i);
-         if (range_of_expr (arg_range, arg, /*stmt=*/NULL))
+         if (m_ranger.range_of_expr (arg_range, arg, /*stmt=*/NULL))
            r.union_ (arg_range);
          else
            {

...thus ensuring that function never uses the cache. All the lookups are done with the global ranger at either the path entry or globally as above (with stmt=NULL).

I believe the switch from range_of_expr to m_ranger.range_of_expr is safe, as the original code was added to handle silly things like PHI <5(99), 6(88)> which shouldn't need path aware ranges.

As you've found out, the update to the cache in this case was not obvious at all. Perhaps it should also be commented:

"It is safe to set the cache here, as range_defined_in_block for PHIs (ssa_range_in_phi) is guaranteed not to do any cache lookups."


The new version drops this, possibly resulting in wrong-code.

While I think it's appropriate to sort out compile-time issues like this
during stage4 at least the above makes me think it should be defered
to next stage1.

I defer to the release managers as to whether this is safe in light of my explanation above :).

Aldy

Reply via email to