> On Aug. 16, 2014, 4:01 p.m., Nilay Vaish wrote: > > Two points that I would like to make: > > * The opening comment in the patch states that it is trying to do two > > things. I would suggest that we split the patch. > > > > * I think we should not drop the original behaviour. Firstly, it was not > > incorrect. > > Secondly, no reason has been provided as to why the behaviour implemented > > should be preferred. Are we sure that most out-of-order processors would > > choose the proposed over the original? > > Stephan Diestelhorst wrote: > In addition to Mitch's comments on the ML (relating to not modelling an > old Alpha), the behaviour here is more in-line with what other simulators > (*cough* PTLsim, MARSSx86 *cough*) provide. IIRC, the respective > justification there also cites some P4-era patents, but I have never reviewed > those. > > Would it make sense to have a small real-world example and measure / > compare performance / relevant performance counters? Intuitively, I'd agree > that modern uarches go to quite some length to prevent replays / handle > blocking more gracefully.
If you can provide such an example, that would be terrific. - Nilay ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2332/#review5261 ----------------------------------------------------------- On Aug. 13, 2014, 2:06 p.m., Andreas Hansson wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.gem5.org/r/2332/ > ----------------------------------------------------------- > > (Updated Aug. 13, 2014, 2:06 p.m.) > > > Review request for Default. > > > Repository: gem5 > > > Description > ------- > > Changeset 10300:bddebc19285f > --------------------------- > cpu: Fix cached block load behavior in o3 cpu > > This patch fixes the load blocked/replay mechanism in the o3 cpu. Rather than > flushing the entire pipeline, this patch replays loads once the cache becomes > unblocked. > > Additionally, deferred memory instructions (loads which had conflicting > stores), > when replayed would not respect the number of functional units (only respected > issue width). This patch also corrects that. > > Improvements over 20% have been observed on a microbenchmark designed to > exercise this behavior. > > > Diffs > ----- > > src/cpu/o3/iew.hh 79fde1c67ed8 > src/cpu/o3/iew_impl.hh 79fde1c67ed8 > src/cpu/o3/inst_queue.hh 79fde1c67ed8 > src/cpu/o3/inst_queue_impl.hh 79fde1c67ed8 > src/cpu/o3/lsq.hh 79fde1c67ed8 > src/cpu/o3/lsq_impl.hh 79fde1c67ed8 > src/cpu/o3/lsq_unit.hh 79fde1c67ed8 > src/cpu/o3/lsq_unit_impl.hh 79fde1c67ed8 > src/cpu/o3/mem_dep_unit.hh 79fde1c67ed8 > src/cpu/o3/mem_dep_unit_impl.hh 79fde1c67ed8 > > Diff: http://reviews.gem5.org/r/2332/diff/ > > > Testing > ------- > > > Thanks, > > Andreas Hansson > > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
