> On Aug. 16, 2014, 4:01 p.m., Nilay Vaish wrote:
> > Two points that I would like to make:
> > * The opening comment in the patch states that it is trying to do two
> > things.  I would suggest that we split the patch.
> > 
> > * I think we should not drop the original behaviour.  Firstly, it was not 
> > incorrect.
> > Secondly, no reason has been provided as to why the behaviour implemented
> > should be preferred.  Are we sure that most out-of-order processors would
> > choose the proposed over the original?
> 
> Stephan Diestelhorst wrote:
>     In addition to Mitch's comments on the ML (relating to not modelling an 
> old Alpha), the behaviour here is more in-line with what other simulators 
> (*cough* PTLsim, MARSSx86 *cough*) provide.  IIRC, the respective 
> justification there also cites some P4-era patents, but I have never reviewed 
> those.
>     
>     Would it make sense to have a small real-world example and measure / 
> compare performance / relevant performance counters? Intuitively, I'd agree 
> that modern uarches go to quite some length to prevent replays / handle 
> blocking more gracefully.

If you can provide such an example, that would be terrific.


- Nilay


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2332/#review5261
-----------------------------------------------------------


On Aug. 13, 2014, 2:06 p.m., Andreas Hansson wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2332/
> -----------------------------------------------------------
> 
> (Updated Aug. 13, 2014, 2:06 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 10300:bddebc19285f
> ---------------------------
> cpu: Fix cached block load behavior in o3 cpu
> 
> This patch fixes the load blocked/replay mechanism in the o3 cpu.  Rather than
> flushing the entire pipeline, this patch replays loads once the cache becomes
> unblocked.
> 
> Additionally, deferred memory instructions (loads which had conflicting 
> stores),
> when replayed would not respect the number of functional units (only respected
> issue width).  This patch also corrects that.
> 
> Improvements over 20% have been observed on a microbenchmark designed to
> exercise this behavior.
> 
> 
> Diffs
> -----
> 
>   src/cpu/o3/iew.hh 79fde1c67ed8 
>   src/cpu/o3/iew_impl.hh 79fde1c67ed8 
>   src/cpu/o3/inst_queue.hh 79fde1c67ed8 
>   src/cpu/o3/inst_queue_impl.hh 79fde1c67ed8 
>   src/cpu/o3/lsq.hh 79fde1c67ed8 
>   src/cpu/o3/lsq_impl.hh 79fde1c67ed8 
>   src/cpu/o3/lsq_unit.hh 79fde1c67ed8 
>   src/cpu/o3/lsq_unit_impl.hh 79fde1c67ed8 
>   src/cpu/o3/mem_dep_unit.hh 79fde1c67ed8 
>   src/cpu/o3/mem_dep_unit_impl.hh 79fde1c67ed8 
> 
> Diff: http://reviews.gem5.org/r/2332/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Andreas Hansson
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to