Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

Richard Biener Thu, 19 Sep 2019 03:47:11 -0700

On Thu, Sep 19, 2019 at 3:28 AM Prathamesh Kulkarni
<[email protected]> wrote:
>
> On Wed, 18 Sep 2019 at 22:17, Prathamesh Kulkarni
> <[email protected]> wrote:
> >
> > On Wed, 18 Sep 2019 at 01:46, Richard Biener <[email protected]> 
> > wrote:
> > >
> > > On Tue, Sep 17, 2019 at 7:18 PM Wilco Dijkstra <[email protected]> 
> > > wrote:
> > > >
> > > > Hi Richard,
> > > >
> > > > > The issue with the bugzilla is that it lacked appropriate testcase(s) 
> > > > > and thus
> > > > > it is now a mess.  There are clear testcases (maybe not in the 
> > > > > benchmarks you
> > > >
> > > > Agreed - it's not clear whether any of the proposed changes would 
> > > > actually
> > > > help the original issue. My patch absolutely does.
> > > >
> > > > > care about) that benefit from code hoisting as enabler, mainly when 
> > > > > control
> > > > > flow can be then converted to data flow.  Also note that "size 
> > > > > optimizations"
> > > > > are important for all cases where followup transforms have size 
> > > > > limits on the IL
> > > > > in place.
> > > >
> > > > The gain from -fcode-hoisting is about 0.2% overall on Thumb-2. Ie. 
> > > > it's definitely
> > > > useful, but there are much larger gains to be had from other tweaks 
> > > > [1]. So we can
> > > > live without it until a better solution is found.
> > >
> > > A "solution" for better eembc benchmark results?
> > >
> > > The issues are all latent even w/o code-hoisting since you can do the
> > > same transform at the source level.  Which is usually why I argue
> > > trying to fix this in code-hoisting is not a complete fix.  Nor is turning
> > > off random GIMPLE passes for specific benchmark regressions.
> > >
> > > Anyway, it's arm maintainers call if you want to have such hacks in
> > > place or not.
> > >
> > > As a release manager I say that GCC isn't a benchmark compiler.
> > >
> > > As the one "responsible" for the code-hoisting introduction I say that
> > > as long as I don't have access to the actual benchmark I can't assess
> > > wrongdoing of the pass nor suggest an appropriate place for optimization.
> > Hi Richard,
> > The actual benchmark function for PR80155 is almost identical to FMS()
> > function defined in
> > pr77445-2.c, and the test-case reproduces the same issue as in the 
> > benchmark.
> Hi,
> The attached patch is another workaround for hoisting. The rationale
> behind the patch is
> to avoid "long range" hoistings  for a "large enough" CFG.


But that isn't a good heuristic.  The issue with coming up with a good heuristic
is that we'd need to evaluate the effect on register pressure which, on GIMPLE
at least, can be only approximated.  The insertion phase of PRE isn't a good
place to do that, the only place you could reasonably decide on this is
the elimination phase which sees the earlier inserted load and increment
and needs to decide whether to replace the later load and increment with
the loaded value.  But in the end this boils down to do scheduling and
register rematerialization on GIMPLE based on the value graph which is
of course possible but not implemented and if implemented is probably
better done as a standalone pass.

> The patch walks dom tree for the block and finds the "furthest" dom
> block, for which
> intersect_p (availout_in_some, AVAIL_OUT (dom_block)) is true. The
> "distance" is measured
> in terms of dom depth.
>
> We can have two params say param_hoist_n_bbs to determine "large enough"
> CFG, and param_hoist_expr_dist, that avoids hoisting if the "distance"
> exceeds the param threshold.
> For the values (hardcoded) in the patch, it "works" for avoiding the
> spill and does not regress ssa-pre-*.c and ssa-hoist-*.c
> tests (the param values could be made target-specific). Does the
> approach look reasonable ?

As said, not really.  First you only see parts of the expression hoisted;
in the testcase it's *transitions + 1 where you first see _1 = *transition
and then _2 = _1 + 1; if it were *transitions + _3 and _3 dies here then
hoisting doesn't make register pressure worse; if it were _4 + _3 and
both operands die here then hoisting decreases register pressure.

On x86 I don't see any spilling so the other question is whether this
is really a register pressure issue?

> My concern with current version is that walking dom tree per block in
> do_hoist_insertion can end up being
> pretty expensive. Any suggestions on how we could speed that up ?

You can probably use the DFS numbers directly?

> Alternatively, we could consider hoisting from "bottom up", one block
> at a time, and keep a map to count number of
> times an expr is hoisted and refuse hoisting the expr further up if it
> exceeds target-specific param threshold ?

Hoisting piggy-backs on the ANTIC data flow solution so that doesn't work
in the current framework.  In fact the current framework relies on top-down
operation to not hoist things one at a time.

Richard.

> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Richard.
> > >
> > > >
> > > > [1] https://gcc.gnu.org/ml/gcc-patches/2019-07/msg01739.html
> > > >
> > > > Wilco

Re: [PATCH][ARM] Enable code hoisting with -Os (PR80155)

Reply via email to