Hi,

I'm the one who wrote this patch.


** The opening comment in the patch states that it is trying to do
twothings.  I would suggest that we split the patch.*

Related code was already being changed by this patch, and going out of the
way to handle "blocked" and "deferred" memory instructions differently than
each other seemed wrong.  They are both cases of memory instructions
replayed for different reasons.  A side effect of treating them the same is
that the resources are now properly modeled.






** I think we should not drop the original behaviour.  Firstly, it was not
incorrect.Secondly, no reason has been provided as to why the behaviour
implementedshould be preferred.  Are we sure that most out-of-order
processors wouldchoose the proposed over the original?*


The existing replay logic in gem5 attempts to model that which was present
in the Alpha 21264 (
http://www.ece.cmu.edu/~ece447/s14/lib/exe/fetch.php?media=21264hrm.pdf).
 Specifically:

*"There are some situations in which a load or store instruction cannot be
executed due to a condition that occurs after that instruction issues from
the IQ or FQ. The instruction is aborted (along with all newer
instructions) and restarted from the fetch stage of the pipeline. This
mechanism is called a replay trap."*


However, it is doubtful that any modern processor would desire this
behavior.  The current symptom is that o3 repeatedly re-fetches and
executes the same sequence of instructions unnecessarily multiple times
until the cache finally becomes unblocked.  This behavior is more
detrimental to performance and power efficiency than even the P4's replay
mechanism which was so power hungry, it has its own wikipedia entry (
http://en.wikipedia.org/wiki/Replay_system).  The P4 was more conservative
than gem5 in that it didn't have to re-fetch. Nowadays, due to power
considerations, replay events are rarer and more selective in modern
processors.  Mikko previously had a paper explaining replay mechanisms and
the importance of limited replay in modern processors (
http://pharm.ece.wisc.edu/papers/hpca2004ikim.pdf).

Though truthful to the 21264, complaints about the existing logic have
already been brought up multiple times on the mailing list and in Karu's
WDDD paper.  Since gem5 is a research simulator and not a historical
tribute to the 21264, I see little value in keeping the old semantics.  It
would add complication to the code and likely add infrequently tested code
paths.


Hope that clears up the reasoning behind dropping the old functionality.



On Sat, Aug 16, 2014 at 11:01 AM, Nilay Vaish via gem5-dev <
[email protected]> wrote:

>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2332/#review5261
> -----------------------------------------------------------
>
>
> Two points that I would like to make:
> * The opening comment in the patch states that it is trying to do two
> things.  I would suggest that we split the patch.
>
> * I think we should not drop the original behaviour.  Firstly, it was not
> incorrect.
> Secondly, no reason has been provided as to why the behaviour implemented
> should be preferred.  Are we sure that most out-of-order processors would
> choose the proposed over the original?
>
> - Nilay Vaish
>
>
> On Aug. 13, 2014, 2:06 p.m., Andreas Hansson wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > http://reviews.gem5.org/r/2332/
> > -----------------------------------------------------------
> >
> > (Updated Aug. 13, 2014, 2:06 p.m.)
> >
> >
> > Review request for Default.
> >
> >
> > Repository: gem5
> >
> >
> > Description
> > -------
> >
> > Changeset 10300:bddebc19285f
> > ---------------------------
> > cpu: Fix cached block load behavior in o3 cpu
> >
> > This patch fixes the load blocked/replay mechanism in the o3 cpu.
> Rather than
> > flushing the entire pipeline, this patch replays loads once the cache
> becomes
> > unblocked.
> >
> > Additionally, deferred memory instructions (loads which had conflicting
> stores),
> > when replayed would not respect the number of functional units (only
> respected
> > issue width).  This patch also corrects that.
> >
> > Improvements over 20% have been observed on a microbenchmark designed to
> > exercise this behavior.
> >
> >
> > Diffs
> > -----
> >
> >   src/cpu/o3/iew.hh 79fde1c67ed8
> >   src/cpu/o3/iew_impl.hh 79fde1c67ed8
> >   src/cpu/o3/inst_queue.hh 79fde1c67ed8
> >   src/cpu/o3/inst_queue_impl.hh 79fde1c67ed8
> >   src/cpu/o3/lsq.hh 79fde1c67ed8
> >   src/cpu/o3/lsq_impl.hh 79fde1c67ed8
> >   src/cpu/o3/lsq_unit.hh 79fde1c67ed8
> >   src/cpu/o3/lsq_unit_impl.hh 79fde1c67ed8
> >   src/cpu/o3/mem_dep_unit.hh 79fde1c67ed8
> >   src/cpu/o3/mem_dep_unit_impl.hh 79fde1c67ed8
> >
> > Diff: http://reviews.gem5.org/r/2332/diff/
> >
> >
> > Testing
> > -------
> >
> >
> > Thanks,
> >
> > Andreas Hansson
> >
> >
>
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to