Re: [gem5-users] documents on O3 cpu implementation?

Chen Tian Sat, 26 Jan 2013 20:30:09 -0800

I have not spent much time in simulator before. However, my about 10-day
experience with gem5 tells me that this simulator is great.  It's very
honorable to be part of the project or able to answer questions in the
mailing list. Being polite in both asking and answering questions will
certainly help the gem5 community grow.


Now it's time to close this thread and enjoy the rest of the weekend. :-)

Best,
Chen


On Sat, Jan 26, 2013 at 7:00 PM, Mitch Hayenga <[email protected]
> wrote:

> Nilay,
>
> Stop talking down to this guy like he is someone without any
> understanding.  Granted this email list gets a lot of "dumb" questions.
>  But you jump to conclusions too fast and are too condescending to people
> on this email list.  This guy is a PhD graduate with many publications (
> http://www.cs.ucr.edu/~tianc/).
>
>
>
> On Sat, Jan 26, 2013 at 8:23 PM, Mitch Hayenga <
> [email protected]> wrote:
>
>> "If both answers are t-1, which means the output of any stage only
>> depends on some other stages' output at previous cycle, then I can
>> understand why time buffer can get ride of the dependencies. However, if a
>> stage requires a result from another stage at the same cycle, I cannot see
>> how this works. Maybe hardware never does that"
>>
>> This is correct.  In general, pipestages in gem5 should only use
>> timebuffers to communicate with each other.  Otherwise dependencies would
>> form, requiring them to be clocked in a specific order.  So timebuffers are
>> only used to communicate across cycles (and never between two stages on the
>> same cycle).
>>
>> Lets explain timebuffers this way.... So, from SimpleScalar you are used
>> to doing things like "clocking decode before fetch".  The reason you do
>> this is so that you empty out decode's queue of instructions before
>> allowing fetch to re-fill it.  If you had clocked it in the other fashion,
>> your simulator might "cheat" and allow things fetched on the current cycle
>> to make their way through decode as well.  Basically, you need
>> some guarantee of how you pass values around, and clocking things in
>> reverse lets you do this with existing structures without "cheating".
>>
>> Instead of doing things this way, pipestages in gem5 do not write
>> directly to another pipestages storage structures.  Instead think of each
>> pipestage as keeping a "queue of its outputs for the last n cycles".  This
>> is what the "timebuffer" essentially is.  A queue of outputs generated by a
>> pipestage over the last n cycles.  So, decode will typically look at
>> whatever fetch produced on the cycle before. So it will index by -1 (the
>> last cycle) into the respective timebuffer.
>>
>> Now this probably looks over-complicated to you.  Why should a pipestage
>> keep things like a queue of its outputs (for however many cycles)?  Why not
>> just have a single storage place for the outputs of fetch?  The reason gem5
>> does this is it lets us fake longer/more detailed pipestages easily.
>>  Pretend we are simulating a machine with a deep pipeline (3-cycle fetch).
>> F1, F2, F3 followed by Decode.  A timebuffer lets us fake this deep
>> pipeline without changing much of our logic.  Instead the fetch and decode
>> pipeline stages stay the same, but we just tell decode to index by -3 into
>> the timebuffer of results from fetch.  So results show up at decode 3
>> cycles after they have been fetched.  So, we effectively don't care about
>> the order pipeline stages are clocked and we can fake pipelines with an
>> arbitrary number of pipestages.
>>
>> Hope this clears up some of your confusion.
>>
>> On Sat, Jan 26, 2013 at 7:53 PM, Chen Tian <[email protected]> wrote:
>>
>>> Hi Nilay,
>>>
>>> I don't have any trouble understanding either the concept of a pipeline
>>> defined in any textbook or the implementation of SimpleScalar , or even the
>>> InOrder CPU model in GEM5 where you update a stage and notify an earlier
>>> stage at the same tick (so you go backwards). Just today when I looked at
>>> the two-page slides of time buffer mentioned by Mitch and tried to
>>> understand it, I was lost. If my question looks silly to you, sorry about
>>> that. By reading this thread one more time, I think what I have not got is
>>> how, in Andreas' words, "the update and the notification are separated
>>> in time" by using time buffer.
>>>
>>> Chen
>>>
>>> On Sat, Jan 26, 2013 at 1:59 PM, Nilay <[email protected]> wrote:
>>>
>>>> On Sat, January 26, 2013 12:01 pm, Chen Tian wrote:
>>>> > Thanks everyone for your reply. I have a better understanding but
>>>> still
>>>> > have questions.
>>>> >
>>>> > Let's consider a time buffer B between two consecutive pipeline
>>>> stages X
>>>> > and Y. When computing Y's output at cycle t, do we need the signal
>>>> passed
>>>> > from X at t or t-1 (i.e., the struct in B with index t or t-1)?
>>>> > Similarly,
>>>> > when computing X's output at cycle t, do we need to look at the
>>>> status of
>>>> > Y
>>>> > at cycle t or t-1 (e.g., whether some hw resource is available for
>>>> this
>>>> > cycle)?  If both answers are t-1, which means the output of any stage
>>>> only
>>>> > depends on some other stages' output at previous cycle, then I can
>>>> > understand why time buffer can get ride of the dependencies. However,
>>>> if a
>>>> > stage requires a result from another stage at the same cycle, I
>>>> cannot see
>>>> > how this works. Maybe hardware never does that -- as it is not
>>>> actually
>>>> > "parallel" between stages. I am not an expert on hardware and
>>>> simulator. I
>>>> > really appreciate it if someone help me understand this.
>>>> >
>>>>
>>>> Can you explain to me what you mean by a pipeline and a stage in a
>>>> pipeline? Further, you need to explain what you mean by hardware not
>>>> being
>>>> actually parallel between stages.
>>>>
>>>> In my opinion, one does not need to be an expert (as I define it for
>>>> myself) to understand a pipelined cpu or a cpu simulator. These topics
>>>> are
>>>> usually part of undergraduate curriculum for computer engineering /
>>>> science. You should read some under-graduate textbooks on designing
>>>> digital circuits and computer architecture. It seems that would be more
>>>> helpful rather than trying to understand how gem5 implements an
>>>> out-of-order cpu.
>>>>
>>>> --
>>>> Nilay
>>>>
>>>>
>>>
>>> _______________________________________________
>>> gem5-users mailing list
>>> [email protected]
>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>>
>>
>>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>

_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] documents on O3 cpu implementation?

Reply via email to