I'm working on improving delay-slot scheduling and would appreciate
advice on a
problem I encountered.
The problem is: how to add support for placing a CODE_LABEL on an
instruction in
a delay slot?
My impression is that this is not supported currently. One way to
implement this
would be to allow labels in the sequence insns which represent the delay
slots.
Another way could be to keep some state external to the rtl
representation that
indicates the presence of a label.
To illustrate why I think that would be useful, let's look at 2 related
examples
of MIPS code, for which delay slot filling is currently not done.
Note: The MIPS has a single delay slot, possibly annulling (annulling
jumps are called branch likely insns for MIPS).
The first example looks like this:
...
beq $2,$0,$L5
nop
lw $3,4($4)
addiu $2,$2,1
...
$L5:
addiu $2,$2,1
...
...
where the beq owns the target thread $L5, in other words the beq is the only
way into $L5. Note that the beq also owns the fall-through thread
(starting at
the lw insn).
The duplicate insn 'addiu $2,$2,1' can be hoisted into the delay slot. This
already happens when branch likely insns are enabled. The mechanism works as
follows: first the code is transformed into:
...
beql $2,$0,$L5
addiu $2,$2,1
lw $3,4($4)
addiu $2,$2,1
...
$L5:
...
...
using an annulling jump (beql).
and only then into:
...
beq $2,$0,$L5
addiu $2,$2,1
lw $3,4($4)
...
$L5:
...
...
by try_merge_delay_insns.
A problem with newer MIPSes is that the branch likely instruction has a
performance penalty, and is deprecated. However, if we disable the
branch likely
instruction, the transformation above is not happening anymore.
I wrote some code that detects in this case the duplicate, and
implements the
transformation by deleting the insn in the fallthrough thread and
importing the
other insn into the delay slot. This transformation happens
independently from
branch likely insns, and it happens in a single step.
However, that doesn't work for the second example:
...
beq $3,$0,$L14
nop
$L7:
andi $2,$2,0xffff
...
bne $3,$0,$L7
nop
$L14:
andi $2,$2,0xffff
...
...
What is different from the first example, is that here the beq owns
neither the
fall-through thread ($L7) nor the target thread ($L14). Same for the
bne. In the
first example, the jump owns both threads.
we can think of this transformation:
...
beq $3,$0,$L14new
$L7:
andi $2,$2,0xffff
...
bne $3,$0,$L7
nop
andi $2,$2,0xffff
$L14new:
...
...
but here the label $L7 ends up in the delay slot together with the andi.
Subsequently we transform the second nop in normal fashion:
...
beq $3,$0,$L14new
andi $2,$2,0xffff
$L7new:
...
bne $3,$0,$L7new
andi $2,$2,0xffff
$L14new:
...
...
So, how easy is it to support this 'label in delay slot' in reorg.c? Or
is there
an easier way to achieve the filling of the delay slots in the second
example?
I thought of enabling branch likely insns for the duration of reorg.c, and
transforming leftover branch likely insns back to normal insns after the
reorg
pass, but that comes (sometimes) at a penalty.
Thanks,
- Tom