Thank you Mitch. I just came back to my computer after thinking through why the order of ticking pipeline stage does not matter with time buffer. Your email arrives in time. It confirms what I got and gives me more insights on the simulator design. I really appreciate it.
Best, Chen On Sat, Jan 26, 2013 at 6:23 PM, Mitch Hayenga <[email protected] > wrote: > "If both answers are t-1, which means the output of any stage only depends > on some other stages' output at previous cycle, then I can understand why > time buffer can get ride of the dependencies. However, if a stage requires > a result from another stage at the same cycle, I cannot see how this works. > Maybe hardware never does that" > > This is correct. In general, pipestages in gem5 should only use > timebuffers to communicate with each other. Otherwise dependencies would > form, requiring them to be clocked in a specific order. So timebuffers are > only used to communicate across cycles (and never between two stages on the > same cycle). > > Lets explain timebuffers this way.... So, from SimpleScalar you are used > to doing things like "clocking decode before fetch". The reason you do > this is so that you empty out decode's queue of instructions before > allowing fetch to re-fill it. If you had clocked it in the other fashion, > your simulator might "cheat" and allow things fetched on the current cycle > to make their way through decode as well. Basically, you need > some guarantee of how you pass values around, and clocking things in > reverse lets you do this with existing structures without "cheating". > > Instead of doing things this way, pipestages in gem5 do not write directly > to another pipestages storage structures. Instead think of each pipestage > as keeping a "queue of its outputs for the last n cycles". This is what > the "timebuffer" essentially is. A queue of outputs generated by a > pipestage over the last n cycles. So, decode will typically look at > whatever fetch produced on the cycle before. So it will index by -1 (the > last cycle) into the respective timebuffer. > > Now this probably looks over-complicated to you. Why should a pipestage > keep things like a queue of its outputs (for however many cycles)? Why not > just have a single storage place for the outputs of fetch? The reason gem5 > does this is it lets us fake longer/more detailed pipestages easily. > Pretend we are simulating a machine with a deep pipeline (3-cycle fetch). > F1, F2, F3 followed by Decode. A timebuffer lets us fake this deep > pipeline without changing much of our logic. Instead the fetch and decode > pipeline stages stay the same, but we just tell decode to index by -3 into > the timebuffer of results from fetch. So results show up at decode 3 > cycles after they have been fetched. So, we effectively don't care about > the order pipeline stages are clocked and we can fake pipelines with an > arbitrary number of pipestages. > > Hope this clears up some of your confusion. > > On Sat, Jan 26, 2013 at 7:53 PM, Chen Tian <[email protected]> wrote: > >> Hi Nilay, >> >> I don't have any trouble understanding either the concept of a pipeline >> defined in any textbook or the implementation of SimpleScalar , or even the >> InOrder CPU model in GEM5 where you update a stage and notify an earlier >> stage at the same tick (so you go backwards). Just today when I looked at >> the two-page slides of time buffer mentioned by Mitch and tried to >> understand it, I was lost. If my question looks silly to you, sorry about >> that. By reading this thread one more time, I think what I have not got is >> how, in Andreas' words, "the update and the notification are separated >> in time" by using time buffer. >> >> Chen >> >> On Sat, Jan 26, 2013 at 1:59 PM, Nilay <[email protected]> wrote: >> >>> On Sat, January 26, 2013 12:01 pm, Chen Tian wrote: >>> > Thanks everyone for your reply. I have a better understanding but still >>> > have questions. >>> > >>> > Let's consider a time buffer B between two consecutive pipeline stages >>> X >>> > and Y. When computing Y's output at cycle t, do we need the signal >>> passed >>> > from X at t or t-1 (i.e., the struct in B with index t or t-1)? >>> > Similarly, >>> > when computing X's output at cycle t, do we need to look at the status >>> of >>> > Y >>> > at cycle t or t-1 (e.g., whether some hw resource is available for this >>> > cycle)? If both answers are t-1, which means the output of any stage >>> only >>> > depends on some other stages' output at previous cycle, then I can >>> > understand why time buffer can get ride of the dependencies. However, >>> if a >>> > stage requires a result from another stage at the same cycle, I cannot >>> see >>> > how this works. Maybe hardware never does that -- as it is not actually >>> > "parallel" between stages. I am not an expert on hardware and >>> simulator. I >>> > really appreciate it if someone help me understand this. >>> > >>> >>> Can you explain to me what you mean by a pipeline and a stage in a >>> pipeline? Further, you need to explain what you mean by hardware not >>> being >>> actually parallel between stages. >>> >>> In my opinion, one does not need to be an expert (as I define it for >>> myself) to understand a pipelined cpu or a cpu simulator. These topics >>> are >>> usually part of undergraduate curriculum for computer engineering / >>> science. You should read some under-graduate textbooks on designing >>> digital circuits and computer architecture. It seems that would be more >>> helpful rather than trying to understand how gem5 implements an >>> out-of-order cpu. >>> >>> -- >>> Nilay >>> >>> >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
