Hi, I'm the one who wrote this patch.
** The opening comment in the patch states that it is trying to do twothings. I would suggest that we split the patch.* Related code was already being changed by this patch, and going out of the way to handle "blocked" and "deferred" memory instructions differently than each other seemed wrong. They are both cases of memory instructions replayed for different reasons. A side effect of treating them the same is that the resources are now properly modeled. ** I think we should not drop the original behaviour. Firstly, it was not incorrect.Secondly, no reason has been provided as to why the behaviour implementedshould be preferred. Are we sure that most out-of-order processors wouldchoose the proposed over the original?* The existing replay logic in gem5 attempts to model that which was present in the Alpha 21264 ( http://www.ece.cmu.edu/~ece447/s14/lib/exe/fetch.php?media=21264hrm.pdf). Specifically: *"There are some situations in which a load or store instruction cannot be executed due to a condition that occurs after that instruction issues from the IQ or FQ. The instruction is aborted (along with all newer instructions) and restarted from the fetch stage of the pipeline. This mechanism is called a replay trap."* However, it is doubtful that any modern processor would desire this behavior. The current symptom is that o3 repeatedly re-fetches and executes the same sequence of instructions unnecessarily multiple times until the cache finally becomes unblocked. This behavior is more detrimental to performance and power efficiency than even the P4's replay mechanism which was so power hungry, it has its own wikipedia entry ( http://en.wikipedia.org/wiki/Replay_system). The P4 was more conservative than gem5 in that it didn't have to re-fetch. Nowadays, due to power considerations, replay events are rarer and more selective in modern processors. Mikko previously had a paper explaining replay mechanisms and the importance of limited replay in modern processors ( http://pharm.ece.wisc.edu/papers/hpca2004ikim.pdf). Though truthful to the 21264, complaints about the existing logic have already been brought up multiple times on the mailing list and in Karu's WDDD paper. Since gem5 is a research simulator and not a historical tribute to the 21264, I see little value in keeping the old semantics. It would add complication to the code and likely add infrequently tested code paths. Hope that clears up the reasoning behind dropping the old functionality. On Sat, Aug 16, 2014 at 11:01 AM, Nilay Vaish via gem5-dev < [email protected]> wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.gem5.org/r/2332/#review5261 > ----------------------------------------------------------- > > > Two points that I would like to make: > * The opening comment in the patch states that it is trying to do two > things. I would suggest that we split the patch. > > * I think we should not drop the original behaviour. Firstly, it was not > incorrect. > Secondly, no reason has been provided as to why the behaviour implemented > should be preferred. Are we sure that most out-of-order processors would > choose the proposed over the original? > > - Nilay Vaish > > > On Aug. 13, 2014, 2:06 p.m., Andreas Hansson wrote: > > > > ----------------------------------------------------------- > > This is an automatically generated e-mail. To reply, visit: > > http://reviews.gem5.org/r/2332/ > > ----------------------------------------------------------- > > > > (Updated Aug. 13, 2014, 2:06 p.m.) > > > > > > Review request for Default. > > > > > > Repository: gem5 > > > > > > Description > > ------- > > > > Changeset 10300:bddebc19285f > > --------------------------- > > cpu: Fix cached block load behavior in o3 cpu > > > > This patch fixes the load blocked/replay mechanism in the o3 cpu. > Rather than > > flushing the entire pipeline, this patch replays loads once the cache > becomes > > unblocked. > > > > Additionally, deferred memory instructions (loads which had conflicting > stores), > > when replayed would not respect the number of functional units (only > respected > > issue width). This patch also corrects that. > > > > Improvements over 20% have been observed on a microbenchmark designed to > > exercise this behavior. > > > > > > Diffs > > ----- > > > > src/cpu/o3/iew.hh 79fde1c67ed8 > > src/cpu/o3/iew_impl.hh 79fde1c67ed8 > > src/cpu/o3/inst_queue.hh 79fde1c67ed8 > > src/cpu/o3/inst_queue_impl.hh 79fde1c67ed8 > > src/cpu/o3/lsq.hh 79fde1c67ed8 > > src/cpu/o3/lsq_impl.hh 79fde1c67ed8 > > src/cpu/o3/lsq_unit.hh 79fde1c67ed8 > > src/cpu/o3/lsq_unit_impl.hh 79fde1c67ed8 > > src/cpu/o3/mem_dep_unit.hh 79fde1c67ed8 > > src/cpu/o3/mem_dep_unit_impl.hh 79fde1c67ed8 > > > > Diff: http://reviews.gem5.org/r/2332/diff/ > > > > > > Testing > > ------- > > > > > > Thanks, > > > > Andreas Hansson > > > > > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
