Re: Instructions timing

Peter Teichmann Thu, 6 May 1999 06:50:20 -0700
In message <[EMAIL PROTECTED]>
          Russell King - ARM Linux Admin <[EMAIL PROTECTED]> wrote:

> Deborah Wallach writes:
> > I think you're misreading the (incredibly cryptic) instruction timing
> > table.  What mine says is:
> > 
> > Instruction Group                       Result Delay    Issue Cycles
> > -------------------------------------   ------------    ------------
> > load single - writeback of base         0               1
> > load single - load data zero extended   1               1
> > load single - load data sign extended   2               1
> 
> That's identical to both copies that I have here.
> 
> > The first case should only apply to the use of the base register (eg the PC
> > in Christophe's example, since his loads are PC-relative).  Since the next
> > instruction isn't using the PC, but rather the result of the LDR, this
> > timing doesn't apply.
> 
> Why doesn't it?  The wording given is 'The result delay is the number
> of cycles the next sequential instruction would stall if it used the
> result as an input'.  I read this as saying that you can ignore the
> result delay, unless the next instruction uses the previous result.
> 
> Hence, a ldr r0, [r1], #0 falls into the first category quite definitely,
> since there is writeback of base and there isn't and data extending being
> performed...

Perhaps the description in this table is a bit unclear. I think all normal
load instruction are regarded as zero-extended. I have here the SA-110 Timing
Application Note from Digital (June 1997). In it you can read that normal Load
instructions and Mul instructions have a 1 cycle result delay, and sign
extended load instructions have a 2 cycle result delay. BTW, I am pretty sure
that also on Arm610 Load instructions have a 1 cycle result delay.

On page 8/9 I can read there:

2.2.4  Load Word, Load Byte, and Load Halfword Instructions

These load instructions, when they hit the Dcache (and D-TLBs), require one
cycle in each pipeline stage. On a cache or TLB miss, the load instructions
stay in the buffer stage until the requested data is available. They can read
their inputs either from the register file during the decode stage, or from
bypasses during the execute stage.

A load instruction may have 2 results:

*  An updated base register.  This is available through bypasses on
   completion of the execute stage.

*  The value loaded.  This is available through bypasses as soon as the load
   instruction leaves the buffer stage.

A load instruction will stall in the decode stage if:

*  The execute stage is still in use by the previous instruction, or the
   previous instruction (which may be a NULL instruction) is stalled in the
   execute stage.

*  The instruction requires a result generated by the buffer stage of a
   previous instruction (a memory access instruction, multiply, or system
   coprocessor access instruction), and that result is not yet available.

A load instruction will stall in the execute stage if the buffer stage is
busy. Being busy means that either the previous instruction is still in the
buffer stage, or that a previous instruction caused the Dcache to start a
cache line fill that has not yet completed. Note that this is different from
the conditions under which a data processing instruction stalls.

Table 2-6 shows the behavior of a load instruction followed by a data
processing instruction that uses the result of the load instruction. The
instruction sequence illustrated is:

0         LDR r1, [r0,+4]!
4         MOV r2, r1

It assumes that the load hits the cache.

Table 2-6 Register Conflict Between a Load Instruction and a Following Data
Processing Instruction

Fetch             Decode         Execute         Buffer          Writeback
Stage             Stage          Stage           Stage           Stage

Fetch from        -              -               -               -
X
Fetch from        Decode         -               -               -
X+4               LDR r1,[r0,+4]

Fetch from        Decode         Calculate       -               -
X+8               MOV r2, r1     r0+4

Do nothing        Do nothing     Do nothing      Read            -
                                                 contents of
                                                 r0+4  from
                                                 cache

Fetch from        Decode         Execute         Do nothing      Write new
X+12              instruction    MOV r2, r1                      r0 and r1 to
                  at X+8         using bypass                    register file
                                 as input
                                 
-                 -              -               Buffer new      Do nothing
                                                 r2
                                                 
-                 -              -               -               Write new
                                                                 r2 to
                                                                 register file
-- 
Peter Teichmann

----------------------------------------------------------------------------
Email: [EMAIL PROTECTED]  WWW: rcswww.urz.tu-dresden.de/~teich-p

unsubscribe: body of `unsubscribe linux-arm' to [EMAIL PROTECTED]
Re: Instructions timing

Reply via email to