Hello all,
I've been looking at a code generation issue with GCC 5.2 lately dealing with
register to register moves through memory with -O3 -funroll-loops. For
reference the C code is at the end of this mail. The generated code for mips is
(cut down for clarity, ldc1 and sdc1 are double word floating point stores):
div.d $f8,$f6,$f4
mul.d $f2,$f8,$f8
sdc1 $f2,8($7)
$L38:
ldc1 $f0,8($7) <- load instead of move
li $11,1 # 0x1
<snip>
$L49:
....
div.d $f8,$f6,$f4
addiu $11,$10,3
mul.d $f2,$f8,$f8
sdc1 $f2,8($7)
ldc1 $f0,8($7) <- load instead of move
$L48:
mul.d $f2,$f2,$f0
<snip>
$L45:
mul.d $f2,$f4,$f4
mov.d $f8,$f4
j $L38
sdc1 $f2,8($7)
For the basic block L38, all dominating blocks store to 8($7) which is then
loaded back into another floating register.
Disabling predictive commoning generates:
div.d $f4,$f18,$f2
mul.d $f0,$f4,$f4
$L37:
mul.d $f6,$f0,$f0
li $10,1 # 0x1
mul.d $f8,$f0,$f6
mul.d $f10,$f0,$f8
mul.d $f12,$f0,$f10
mul.d $f14,$f0,$f12
mul.d $f16,$f0,$f14
beq $4,$10,$L38
mul.d $f20,$f0,$f16
For the same basic block.
Following Jeff's advice[1] to extract more information from GCC, I've narrowed
the cause down to the predictive commoning pass inserting the load in a loop
header style basic block. However, the next pass in GCC, tree-cunroll promptly
removes the loop and joins the loop header to the body of the (non)loop. More
oddly, disabling conditional store elimination pass or the dominator
optimizations pass or disabling of jump-threading with --param
max-jump-thread-duplication-stmts=0 nets the above assembly code. Any ideas on
an approach for this issue?
[1] https://gcc.gnu.org/ml/gcc-help/2015-08/msg00162.html
Thanks,
Simon
double N;
int i1;
double T;
double poly[9];
void
g (int iterations)
{
int count = 0;
for (count = 0; count < iterations; count++)
{
if (N > 1)
{
T = 1 / N;
}
else
{
T = N;
}
poly[1] = T * T;
for (i1 = 2; i1 <= 8; i1++)
{
poly[i1] = poly[i1 - 1] * poly[1];
}
}
return;
}