Thank you Mitch. I just came back to my computer after thinking through why
the order of ticking pipeline stage does not matter with time buffer. Your
email arrives in time. It confirms what I got and gives me more insights on
the simulator design. I really appreciate it.

Best,
Chen

On Sat, Jan 26, 2013 at 6:23 PM, Mitch Hayenga <[email protected]
> wrote:

> "If both answers are t-1, which means the output of any stage only depends
> on some other stages' output at previous cycle, then I can understand why
> time buffer can get ride of the dependencies. However, if a stage requires
> a result from another stage at the same cycle, I cannot see how this works.
> Maybe hardware never does that"
>
> This is correct.  In general, pipestages in gem5 should only use
> timebuffers to communicate with each other.  Otherwise dependencies would
> form, requiring them to be clocked in a specific order.  So timebuffers are
> only used to communicate across cycles (and never between two stages on the
> same cycle).
>
> Lets explain timebuffers this way.... So, from SimpleScalar you are used
> to doing things like "clocking decode before fetch".  The reason you do
> this is so that you empty out decode's queue of instructions before
> allowing fetch to re-fill it.  If you had clocked it in the other fashion,
> your simulator might "cheat" and allow things fetched on the current cycle
> to make their way through decode as well.  Basically, you need
> some guarantee of how you pass values around, and clocking things in
> reverse lets you do this with existing structures without "cheating".
>
> Instead of doing things this way, pipestages in gem5 do not write directly
> to another pipestages storage structures.  Instead think of each pipestage
> as keeping a "queue of its outputs for the last n cycles".  This is what
> the "timebuffer" essentially is.  A queue of outputs generated by a
> pipestage over the last n cycles.  So, decode will typically look at
> whatever fetch produced on the cycle before. So it will index by -1 (the
> last cycle) into the respective timebuffer.
>
> Now this probably looks over-complicated to you.  Why should a pipestage
> keep things like a queue of its outputs (for however many cycles)?  Why not
> just have a single storage place for the outputs of fetch?  The reason gem5
> does this is it lets us fake longer/more detailed pipestages easily.
>  Pretend we are simulating a machine with a deep pipeline (3-cycle fetch).
> F1, F2, F3 followed by Decode.  A timebuffer lets us fake this deep
> pipeline without changing much of our logic.  Instead the fetch and decode
> pipeline stages stay the same, but we just tell decode to index by -3 into
> the timebuffer of results from fetch.  So results show up at decode 3
> cycles after they have been fetched.  So, we effectively don't care about
> the order pipeline stages are clocked and we can fake pipelines with an
> arbitrary number of pipestages.
>
> Hope this clears up some of your confusion.
>
> On Sat, Jan 26, 2013 at 7:53 PM, Chen Tian <[email protected]> wrote:
>
>> Hi Nilay,
>>
>> I don't have any trouble understanding either the concept of a pipeline
>> defined in any textbook or the implementation of SimpleScalar , or even the
>> InOrder CPU model in GEM5 where you update a stage and notify an earlier
>> stage at the same tick (so you go backwards). Just today when I looked at
>> the two-page slides of time buffer mentioned by Mitch and tried to
>> understand it, I was lost. If my question looks silly to you, sorry about
>> that. By reading this thread one more time, I think what I have not got is
>> how, in Andreas' words, "the update and the notification are separated
>> in time" by using time buffer.
>>
>> Chen
>>
>> On Sat, Jan 26, 2013 at 1:59 PM, Nilay <[email protected]> wrote:
>>
>>> On Sat, January 26, 2013 12:01 pm, Chen Tian wrote:
>>> > Thanks everyone for your reply. I have a better understanding but still
>>> > have questions.
>>> >
>>> > Let's consider a time buffer B between two consecutive pipeline stages
>>> X
>>> > and Y. When computing Y's output at cycle t, do we need the signal
>>> passed
>>> > from X at t or t-1 (i.e., the struct in B with index t or t-1)?
>>> > Similarly,
>>> > when computing X's output at cycle t, do we need to look at the status
>>> of
>>> > Y
>>> > at cycle t or t-1 (e.g., whether some hw resource is available for this
>>> > cycle)?  If both answers are t-1, which means the output of any stage
>>> only
>>> > depends on some other stages' output at previous cycle, then I can
>>> > understand why time buffer can get ride of the dependencies. However,
>>> if a
>>> > stage requires a result from another stage at the same cycle, I cannot
>>> see
>>> > how this works. Maybe hardware never does that -- as it is not actually
>>> > "parallel" between stages. I am not an expert on hardware and
>>> simulator. I
>>> > really appreciate it if someone help me understand this.
>>> >
>>>
>>> Can you explain to me what you mean by a pipeline and a stage in a
>>> pipeline? Further, you need to explain what you mean by hardware not
>>> being
>>> actually parallel between stages.
>>>
>>> In my opinion, one does not need to be an expert (as I define it for
>>> myself) to understand a pipelined cpu or a cpu simulator. These topics
>>> are
>>> usually part of undergraduate curriculum for computer engineering /
>>> science. You should read some under-graduate textbooks on designing
>>> digital circuits and computer architecture. It seems that would be more
>>> helpful rather than trying to understand how gem5 implements an
>>> out-of-order cpu.
>>>
>>> --
>>> Nilay
>>>
>>>
>>
>> _______________________________________________
>> gem5-users mailing list
>> [email protected]
>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to