Re: [PR72835] Incorrect arithmetic optimization involving bitfield arguments

kugan Mon, 19 Sep 2016 16:33:34 -0700

Hi Richard,
Thanks for the review.

On 19/09/16 23:40, Richard Biener wrote:

On Sun, Sep 18, 2016 at 10:21 PM, kugan
<kugan.vivekanandara...@linaro.org> wrote:

Hi Richard,



On 14/09/16 21:31, Richard Biener wrote:


On Fri, Sep 2, 2016 at 10:09 AM, Kugan Vivekanandarajah
<kugan.vivekanandara...@linaro.org> wrote:


Hi Richard,

On 25 August 2016 at 22:24, Richard Biener <richard.guent...@gmail.com>
wrote:


On Thu, Aug 11, 2016 at 1:09 AM, kugan
<kugan.vivekanandara...@linaro.org> wrote:


Hi,


On 10/08/16 20:28, Richard Biener wrote:



On Wed, Aug 10, 2016 at 10:57 AM, Jakub Jelinek <ja...@redhat.com>
wrote:



On Wed, Aug 10, 2016 at 08:51:32AM +1000, kugan wrote:



I see it now. The problem is we are just looking at (-1) being in
the
ops
list for passing changed to rewrite_expr_tree in the case of
multiplication
by negate.  If we have combined (-1), as in the testcase, we will
not
have
the (-1) and will pass changed=false to rewrite_expr_tree.

We should set changed based on what happens in
try_special_add_to_ops.
Attached patch does this. Bootstrap and regression testing are
ongoing.
Is
this OK for trunk if there is no regression.




I think the bug is elsewhere.  In particular in
undistribute_ops_list/zero_one_operation/decrement_power.
All those look problematic in this regard, they change RHS of
statements
to something that holds a different value, while keeping the LHS.
So, generally you should instead just add a new stmt next to the old
one,
and adjust data structures (replace the old SSA_NAME in some ->op
with
the new one).  decrement_power might be a problem here, dunno if all
the
builtins are const in all cases that DSE would kill the old one,
Richard, any preferences for that?  reset flow sensitive info + reset
debug
stmt uses, or something different?  Though, replacing the LHS with a
new
anonymous SSA_NAME might be needed too, in case it is before SSA_NAME
of
a
user var that doesn't yet have any debug stmts.




I'd say replacing the LHS is the way to go, with calling the
appropriate
helper
on the old stmt to generate a debug stmt for it / its uses (would need
to look it
up here).


Here is an attempt to fix it. The problem arises when in
undistribute_ops_list, we linearize_expr_tree such that NEGATE_EXPR is
added
(-1) MULT_EXPR (OP). Real problem starts when we handle this in
zero_one_operation. Unlike what was done earlier, we now change the
stmt
(with propagate_op_to_signle use or by directly) such that the value
computed by stmt is no longer what it used to be. Because of this, what
is
computed in undistribute_ops_list and rewrite_expr_tree are also
changed.

undistribute_ops_list already expects this but rewrite_expr_tree will
not if
we dont pass the changed as an argument.

The way I am fixing this now is, in linearize_expr_tree, I set
ops_changed
to true if we change NEGATE_EXPR to (-1) MULT_EXPR (OP). Then when we
call
zero_one_operation with ops_changed = true, I replace all the LHS in
zero_one_operation with the new SSA and replace all the uses. I also
call
the rewrite_expr_tree with changed = false in this case.

Does this make sense? Bootstrapped and regression tested for
x86_64-linux-gnu without any new regressions.



I don't think this solves the issue.  zero_one_operation associates the
chain starting at the first *def and it will change the intermediate
values
of _all_ of the stmts visited until the operation to be removed is
found.
Note that this is independent of whether try_special_add_to_ops did
anything.

Even for the regular undistribution cases we get this wrong.

So we need to back-track in zero_one_operation, replacing each LHS
and in the end the op in the opvector of the main chain.  That's
basically
the same as if we'd do a regular re-assoc operation on the sub-chains.
Take their subops, simulate zero_one_operation by
appending the cancelling operation and optimizing the oplist, and then
materializing the associated ops via rewrite_expr_tree.

Here is a draft patch which records the stmt chain when in
zero_one_operation and then fixes it when OP is removed. when we
update *def, that will update the ops vector. Does this looks sane?



Yes.  A few comments below

+  /* PR72835 - Record the stmt chain that has to be updated such that
+     we dont use the same LHS when the values computed are different.  */
+  auto_vec<gimple *> stmts_to_fix;

use auto_vec<gimple *, 64> here so we get stack allocation only most
of the times


Done.

          if (stmt_is_power_of_op (stmt, op))
            {
+             make_new_ssa_for_all_defs (def, op, stmts_to_fix);
              if (decrement_power (stmt) == 1)
                propagate_op_to_single_use (op, stmt, def);

for the cases you end up with propagate_op_to_single_use its argument
stmt is handled superfluosly in the new SSA making, I suggest to pop it
from the stmts_to_fix vector in that case.  I suggest to break; instead
of return in all cases and do the make_new_ssa_for_all_defs call at
the function end instead.

Done.

@@ -1253,14 +1305,18 @@ zero_one_operation (tree *def, enum tree_code
opcode, tree op)
              if (gimple_assign_rhs1 (stmt2) == op)
                {
                  tree cst = build_minus_one_cst (TREE_TYPE (op));
+                 stmts_to_fix.safe_push (stmt2);
+                 make_new_ssa_for_all_defs (def, op, stmts_to_fix);
                  propagate_op_to_single_use (cst, stmt2, def);
                  return;

this safe_push should be unnecessary for the above reason (others are
conditionally unnecessary).

Done.

Bootstrapped and regression tested on X86_64-linux-gnu with no new
regression. Is this OK?


+static void
+make_new_ssa_for_all_defs (tree *def, tree op,
+               auto_vec<gimple *, 64> &stmts_to_fix)

I think you need to use vec<gimple *> &stmts_to_fix here AFAIK.


This is what I had. With that I get:

error: invalid initialization of reference of type ‘auto_vec<gimple*>&’from expression of type ‘auto_vec<gimple*, 64ul>


Is this a bug?

Thanks,
Kugan

Re: [PR72835] Incorrect arithmetic optimization involving bitfield arguments

Reply via email to