On 2/16/23 08:55, Richard Biener wrote:
On Wed, Feb 15, 2023 at 6:07 PM Andrew MacLeod via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
This patch implements the suggestion that we have an alternative
ssa-cache which does not zero memory, and instead uses a bitmap to track
whether a value is currently set or not. It roughly mimics what
path_range_query was doing internally.
For sparsely used cases, expecially in large programs, this is more
efficient. I changed path_range_query to use this, and removed it old
bitmap (and a hack or two around PHI calculations), and also utilized
this is the assume_query class.
Performance wise, the patch doesn't affect VRP (since that still uses
the original version). Switching to the lazy version caused a slowdown
of 2.5% across VRP.
There was a noticeable improvement elsewhere., across 230 GCC source
files, threading ran over 12% faster!. Overall compilation improved by
0.3% Not sure it makes much difference in compiler.i, but it shouldn't
hurt.
bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk?
or do you want to wait for the next release...
I see
@@ -365,16 +335,8 @@ path_range_query::compute_ranges_in_phis (basic_block bb)
Value_Range r (TREE_TYPE (name));
if (range_defined_in_block (r, name, bb))
- {
- unsigned v = SSA_NAME_VERSION (name);
- set_cache (r, name);
- bitmap_set_bit (phi_set, v);
- // Pretend we don't have a cache entry for this name until
- // we're done with all PHIs.
- bitmap_clear_bit (m_has_cache_entry, v);
- }
+ m_cache.set_global_range (name, r);
}
- bitmap_ior_into (m_has_cache_entry, phi_set);
}
// Return TRUE if relations may be invalidated after crossing edge E.
which I think is not correct - if we have
# _1 = PHI <..., _2>
# _2 = PHI <..., _1>
then their effects are supposed to be executed in parallel, that is,
both PHI argument _2 and _1 are supposed to see the "old" version.
The previous code tried to make sure the range of the new _1 doesn't
get seen when processing the argument _1 in the definition of _2.
Yes, the effects should appear in parallel, but ssa_range_in_phi() which
is the only thing range_defined_in_block does for PHIs, is guaranteed to
not do any additional cache lookups. The comment there should be
adjusted to make this clear:
// Since PHIs are calculated in parallel at the beginning of the
// block, we must be careful to never save anything to the cache here.
// It is the caller's responsibility to adjust the cache. Also,
// calculating the PHI's range must not trigger additional lookups.
We should instead say:
"we must be careful to never set or access the cache here"...
This was the original intent, but a subtle access to the cache crept in
here:
// Try to fold the phi exclusively with global or cached values.
// This will get things like PHI <5(99), 6(88)>. We do this by
// calling range_of_expr with no context.
unsigned nargs = gimple_phi_num_args (phi);
Value_Range arg_range (TREE_TYPE (name));
r.set_undefined ();
for (size_t i = 0; i < nargs; ++i)
{
tree arg = gimple_phi_arg_def (phi, i);
if (range_of_expr (arg_range, arg, /*stmt=*/NULL))
This range_of_expr call will indeed access the cache incorrectly, but
Andrew fixed that here:
@@ -264,7 +236,7 @@ path_range_query::ssa_range_in_phi (vrange &r, gphi
*phi)
for (size_t i = 0; i < nargs; ++i)
{
tree arg = gimple_phi_arg_def (phi, i);
- if (range_of_expr (arg_range, arg, /*stmt=*/NULL))
+ if (m_ranger.range_of_expr (arg_range, arg, /*stmt=*/NULL))
r.union_ (arg_range);
else
{
...thus ensuring that function never uses the cache. All the lookups
are done with the global ranger at either the path entry or globally as
above (with stmt=NULL).
I believe the switch from range_of_expr to m_ranger.range_of_expr is
safe, as the original code was added to handle silly things like PHI
<5(99), 6(88)> which shouldn't need path aware ranges.
As you've found out, the update to the cache in this case was not
obvious at all. Perhaps it should also be commented:
"It is safe to set the cache here, as range_defined_in_block for PHIs
(ssa_range_in_phi) is guaranteed not to do any cache lookups."
The new version drops this, possibly resulting in wrong-code.
While I think it's appropriate to sort out compile-time issues like this
during stage4 at least the above makes me think it should be defered
to next stage1.
I defer to the release managers as to whether this is safe in light of
my explanation above :).
Aldy