On Mar 10, 2012, at 1:26 PM, Brian Grayson wrote:

> I am running a testcase with decodeWidth=4, fetchWidth=8, and
> fetchToDecodeDelay set to 1.  This results in skidBufferMax being computed
> as 1*8+4 = 12.
> 
> 
> 
> However, in one workload, the verbose debug log shows that:
> 
> -          Decode starts processing 8 instructions, and becomes blocked
> after 4 (since decodeWidth is only 4).  The remaining 4 then get inserted
> into the skidBuffer.
> 
> -          Fetch had already sent the next 8 instructions, so in the next
> cycle these next 8 instructions enter the skidBuffer.  At this point, the
> skidBuffer is full.
> 
> -          In the next cycle, Fetch generates a Translation-fault NoOp, even
> though Fetch is blocked.  This then flows down to Decode, where it tries to
> put it into the skidBuffer, and asserts start to fire.
> 
> 
> 
> A quick workaround for this is to increase skidBufferMax by one more.  But I
> do not think this is the right fix - even with that, setting the decodeWidth
> to 3 causes the simulator to assert.
> 
> 
> 
> I think the current flow-control between Fetch and Decode is such that with
> a 1-cycle delay, the skidBuffer must be able to hold two full fetches, minus
> what it is _guaranteed_ decode can remove.  For the decode-3 example, out of
> the first batch of 8 instructions, 5 need to go into the skidBuffer.  Then
> next cycle another 8 arrive.  So it seems skidBufferMax should be at least
> (fetchToDecodeDelay*params->fetchWidth) + (fetchWidth - decodeWidth) + 1 /*
> to handle translation faults? */
> 
> 
> 
> However, this is also not sufficient, probably because Decode doesn't
> guarantee that it can always decode 3 instructions.  I did a sweep over
> fetchWidths and decodeWidths from 1 to 8 (64 combinations), and even with
> the above, 13 of the 64 combos failed.
> 
> 
> 
> Empirically, setting it to ((fetchToDecodeDelay+1) *  params->fetchWidth)
> appears to suffice for my simple toy workload across all 64 configs, but I'm
> sure someone else can figure out the bug-free proper value to use that is
> guaranteed to be correct.
> 
> 
> 
> Thanks.
> 
> 
> 
> Brian Grayson
> 

Hi Brian,

My gut reaction is that  ((fetchToDecodeDelay+1) *  params->fetchWidth) is 
probably right, since decode is not guaranteed to remove any instructions and 
the communication delay mean that the stall cycle won't get there for a extra 
cycle. Actually, it might be  ((fetchToDecodeDelay+1) *  params->fetchWidth) - 
1. Since if it was full last cycle a stall should have happened, and thus there 
must be one slot available. 

Thanks,
Ali

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to