The problem I'd like to solve is stuff like
pxor %xmm4, %xmm4
...
movdqa %xmm4, %xmm2
pcmpgtd %xmm0, %xmm2
In that there's no point performing the copy from xmm4
rather than just emitting a new pxor insn.
The Real Problem, as I see it, is that at the point (g)cse
runs we have no visibility into the 2-operand matching
constraint on that pcmpgtd so we make the wrong choice
in sharing the zero.
If we're using AVX, instead of SSE, we don't use matching
constraints and given the 3-operand insn, hoisting the zero
is the right and proper thing to do because we won't need
to emit that movdqa.
Of course, this fires for normal integer code as well.
Some cases it's a clear win:
-: 41 be 1f 00 00 00 mov $0x1f,%r14d
...
-: 4c 89 f1 mov %r14,%rcx
+: b9 1f 00 00 00 mov $0x1f,%ecx
sometimes not (increased code size):
-: 41 bd 01 00 00 00 mov $0x1,%r13d
-: 4d 89 ec mov %r13,%r12
+: 41 bc 01 00 00 00 mov $0x1,%r12d
+: 41 bd 01 00 00 00 mov $0x1,%r13d
although the total difference is minimal, and ambiguous:
new text old text
cc1 13971302 13971342
cc1plus 15882736 15882728
Also, note that in the first case above, r14 is otherwise
unused, and we wind up with an unnecessary save/restore of
the register in the function.
Thoughts?
r~
diff --git a/gcc/reload.c b/gcc/reload.c
index e42cc5c..fb928be 100644
--- a/gcc/reload.c
+++ b/gcc/reload.c
@@ -2534,6 +2534,38 @@ safe_from_earlyclobber (rtx op, rtx clobber)
early_data = decompose (clobber);
return immune_p (op, clobber, early_data);
}
+
+/* For matching operand constraints, we may need to make a reg-reg copy.
+ In some cases the source reg is equivalent to a constant, and it's
+ more efficient to set the dest reg from the constant rather than the
+ source reg. Return the source to use for the reload. */
+
+static rtx
+matching_constraint_src (rtx src)
+{
+ unsigned regno;
+ rtx c;
+
+ if (!REG_P (src))
+ return src;
+
+ regno = REGNO (src);
+ if (HARD_REGISTER_NUM_P (regno))
+ {
+ regno = ORIGINAL_REGNO (src);
+ if (HARD_REGISTER_NUM_P (regno))
+ return src;
+ }
+ c = reg_equiv_constant (regno);
+ if (c == NULL)
+ return src;
+
+ /* Only use the constant when it's just as cheap as a reg move. */
+ if (set_src_cost (c, optimize_function_for_speed_p (cfun)) == 0)
+ return c;
+ else
+ return src;
+}
/* Main entry point of this file: search the body of INSN
for values that need reloading and record them with push_reload.
@@ -4049,28 +4081,28 @@ find_reloads (rtx insn, int replace, int ind_levels,
int live_known,
else if (modified[i] == RELOAD_READ
&& modified[goal_alternative_matched[i]] == RELOAD_WRITE)
{
+ int a = goal_alternative_matched[i];
operand_reloadnum[i]
- = push_reload (recog_data.operand[i],
- recog_data.operand[goal_alternative_matched[i]],
+ = push_reload (matching_constraint_src (recog_data.operand[i]),
+ recog_data.operand[a],
recog_data.operand_loc[i],
-
recog_data.operand_loc[goal_alternative_matched[i]],
+ recog_data.operand_loc[a],
(enum reg_class) goal_alternative[i],
- operand_mode[i],
- operand_mode[goal_alternative_matched[i]],
+ operand_mode[i], operand_mode[a],
0, 0, i, RELOAD_OTHER);
- operand_reloadnum[goal_alternative_matched[i]] = output_reloadnum;
+ operand_reloadnum[a] = output_reloadnum;
}
else if (modified[i] == RELOAD_WRITE
&& modified[goal_alternative_matched[i]] == RELOAD_READ)
{
- operand_reloadnum[goal_alternative_matched[i]]
- = push_reload (recog_data.operand[goal_alternative_matched[i]],
+ int a = goal_alternative_matched[i];
+ operand_reloadnum[a]
+ = push_reload (matching_constraint_src (recog_data.operand[a]),
recog_data.operand[i],
-
recog_data.operand_loc[goal_alternative_matched[i]],
+ recog_data.operand_loc[a],
recog_data.operand_loc[i],
(enum reg_class) goal_alternative[i],
- operand_mode[goal_alternative_matched[i]],
- operand_mode[i],
+ operand_mode[a], operand_mode[i],
0, 0, i, RELOAD_OTHER);
operand_reloadnum[i] = output_reloadnum;
}