Handling labels in delay-slot scheduling

Tom de Vries Thu, 18 Nov 2010 09:31:15 -0800

I'm working on improving delay-slot scheduling and would appreciateadvice on a

problem I encountered.

The problem is: how to add support for placing a CODE_LABEL on aninstruction in

a delay slot?

My impression is that this is not supported currently. One way toimplement thiswould be to allow labels in the sequence insns which represent the delayslots.Another way could be to keep some state external to the rtlrepresentation that

indicates the presence of a label.

To illustrate why I think that would be useful, let's look at 2 relatedexamples

of MIPS code, for which delay slot filling is currently not done.

Note: The MIPS has a single delay slot, possibly annulling (annulling
jumps are called branch likely insns for MIPS).

The first example looks like this:
...
    beq    $2,$0,$L5
    nop
    lw    $3,4($4)
    addiu    $2,$2,1
    ...
$L5:
    addiu    $2,$2,1
        ...
...
where the beq owns the target thread $L5, in other words the beq is the only

way into $L5. Note that the beq also owns the fall-through thread(starting at

the lw insn).

The duplicate insn 'addiu $2,$2,1' can be hoisted into the delay slot. This
already happens when branch likely insns are enabled. The mechanism works as
follows: first the code is transformed into:
...
    beql    $2,$0,$L5
    addiu    $2,$2,1
    lw    $3,4($4)
    addiu    $2,$2,1
        ...
$L5:
    ...
...
using an annulling jump (beql).

and only then into:
...
    beq    $2,$0,$L5
    addiu    $2,$2,1
    lw    $3,4($4)
        ...
$L5:
    ...
...
by try_merge_delay_insns.

A problem with newer MIPSes is that the branch likely instruction has a

performance penalty, and is deprecated. However, if we disable thebranch likely

instruction, the transformation above is not happening anymore.

I wrote some code that detects in this case the duplicate, andimplements thetransformation by deleting the insn in the fallthrough thread andimporting theother insn into the delay slot. This transformation happensindependently from

branch likely insns, and it happens in a single step.

However, that doesn't work for the second example:
...
    beq    $3,$0,$L14
    nop
$L7:
    andi    $2,$2,0xffff
    ...
    bne    $3,$0,$L7
    nop
$L14:
    andi    $2,$2,0xffff
    ...
...

What is different from the first example, is that here the beq ownsneither thefall-through thread ($L7) nor the target thread ($L14). Same for thebne. In the

first example, the jump owns both threads.

we can think of this transformation:
...
    beq    $3,$0,$L14new
$L7:
    andi    $2,$2,0xffff
    ...
    bne    $3,$0,$L7
    nop
    andi    $2,$2,0xffff
$L14new:
    ...
...
but here the label $L7 ends up in the delay slot together with the andi.

Subsequently we transform the second nop in normal fashion:
...
    beq    $3,$0,$L14new
    andi    $2,$2,0xffff
$L7new:
    ...
    bne    $3,$0,$L7new
    andi    $2,$2,0xffff
$L14new:
    ...
...

So, how easy is it to support this 'label in delay slot' in reorg.c? Oris therean easier way to achieve the filling of the delay slots in the secondexample?

I thought of enabling branch likely insns for the duration of reorg.c, and

transforming leftover branch likely insns back to normal insns after thereorg

pass, but that comes (sometimes) at a penalty.

Thanks,
- Tom

Handling labels in delay-slot scheduling

Reply via email to