Sorry, the last email did not really have the problem description. Here it is:
For a multiply instruction, following are the resource requirements at
pipeline stages 1-4
stage-1: inst has two resource requests here - UseDefUnit::ReadSrcReg
and MultDivUnit::StartMultDiv
stage-2: inst has one resource request here - MultDivUnit::EndMultDiv
stage-3: inst had zero resource requests here
stage-4 - inst has two resource requests here -
UseDefUnit::WriteDestReg and GraduationUnit::GraduateInst.
RegDepMap::canWrite() does not check if the instruction has finished
or not and, thus, returns TRUE here. Therefore, GraduationUnit
graduates the instruction (with an incorrect destination register
result), thereby rendering the architectural state of the Register
File incorrect for latencies >= 4. Since there is a 3-cycle gap
between stage 4 and stage 1, the problem does not arise for multiplier
latencies of 1-3.
As a consequence of the above, the simulation output for the
following test program:
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("%d*%d*%d = %d\n", argc, argc, argc, argc*argc*argc);
}
, when run as "a.out 1 1 1", used to be incorrect when the multiplier
latency is 4 cycles (and, possibly, even more). The output is
expected to be 3*3*3 = 27 but it comes out as 3*3*3 = 0. It works
correctly for latencies 1 (default), 2, and 3.
-Soumyaroop
On Mon, Dec 14, 2009 at 2:37 PM, soumyaroop roy <[email protected]> wrote:
> The multiplier bug in inorder that was fixed a few months ago.
>
> -Soumyaroop
>
>
> ---------- Forwarded message ----------
> From: Korey Sewell <[email protected]>
> Date: Wed, Sep 16, 2009 at 10:02 AM
> Subject: Re: Multiplier bug
> To: soumyaroop roy <[email protected]>
>
>
> Hmmm... GREAT CATCH.... I see the problem. The instruction should NOT
> leave stage 2 and complete the EndMultDiv request if the instruction
> has not been executed. Maybe the latency is not getting passed to the
> EndMultDiv event correctly.
>
>> 1. Stall the instruction from going past stage-2 till the
>> multiplication finishes by not setting the "completed" flag for the
>> resource request. However, that does not seem right.
>>
> That's right. The trick is, if you want a multicycle latency op, then
> you should schedule the resource request to finish X cycles later, not
> on the next cycle. Apologize again for no documentation. On the
> m5sim.org wiki page (http://m5sim.org/wiki/index.php/InOrder), I've
> got the TODO list for docs as:
>
> DOCUMENTATION TODO:
>
> (1) Pipeline Stages
>
> -->First Stage
>
> (2) Resources
>
> (3) Resource Pool
>
> (4) Pipeline Definition
>
> (5) Instruction Schedules
>
> Anything else pertinent that you think needs to be included to help
> understand? You can add it to the wiki TODO or just let me know.
>
>
>>
>> 2. Check at stage 4 from within the UseDefUnit if the result of the
>> instruction is ready or not by using one of the
>> isExecuted()/isCompleted()/isResultReady() accesors.
>
> This *should* be happening in the mult_div_unit.cc file. Cant be
> setting to "completed" if it hasnt executed!
>
>
>>
>> I have not used the stageTracing yet. I am not sure if it will be of
>> any further help anymore.
>
> Looks like you've identified it.
>
>>
>> Also, it appears (from pipeline_traits.cc)
>> that register read and issue to the arithmetic functional units take
>> place during the same cycle, like the MIPS processor. Is that correct?
>
> Yes. I made the model extensible though so that you can change it
> simply by editing that function. Hopefully people find that useful :)
>
>
>
>
> --
> - Korey
>
>
>
> --
> Soumyaroop Roy
> Ph.D. Candidate
> Department of Computer Science and Engineering
> University of South Florida, Tampa
> http://www.csee.usf.edu/~sroy
>
--
Soumyaroop Roy
Ph.D. Candidate
Department of Computer Science and Engineering
University of South Florida, Tampa
http://www.csee.usf.edu/~sroy
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev