On Sat, Sep 12, 2015 at 7:31 AM Andreas Hansson <[email protected]>
wrote:

>
>
> > On Sept. 12, 2015, 1:54 p.m., Andreas Hansson wrote:
> > > My comment seems to have gotten lost in all the different threads
> going on...bad S/N. Anyways, here it is:
> > >
> > > I am of the opinion that we should probably 1) do read/write combining
> in the core LSQ before sending out a packet, and 2) combining of MSHR
> targets in the L1 before propagating a miss downwards. I am not sure why we
> would ever do it in the Sequencer. Am I missing something?
> > >
> > > This solution would also translate very well between both Ruby and the
> classic memory system.
> >
> > Joel Hestness wrote:
> >     Hi Andreas. I think we all agree with you about where coalescing
> should happen. It appears that (1) is available from particular cores (e.g.
> the O3 CPU). The problem currently is that getting (2) in Ruby would
> require very non-trivial modification to the L1 controllers (to do
> combining) in each of the existing protocols (7 in gem5 + at least 2 not
> yet in gem5). To avoid all this protocol modification, this patch is AMD's
> proposal to do L1 MSHR-like combining within the sequencer. This proposed
> solution should be viewed as a stopgap on the road to MSHR combining in the
> L1 controllers.
>
> I see. Thanks for the clarification.
>
> I am fairly convinced the O3 CPU is not doing any coalescing at the
> moment. Are you sure? If not I'd say that's the place to start. Coalescing
> in the LSQ is probably as important as coalescing of MSHR targets, if not
> more so.
>

Forking this off into a separate thread since it's not even one of the
issues that Joel brought up when he tried to organize things...

Andreas, can you clarify more what you mean by adding coalescing in the O3
LSQ? I'm not sure exactly what you mean, and I have some concerns about how
that would work and also what it would save us.

First, are you talking about having the LSQ track memory accesses at
cache-block granularity, and only forward a single access per cache block
to the L1 cache?  What happens then when I have multiple read accesses to a
cache block, but the block arrives with a pending invalidation?  In the
current cache we would invalidate the block after completing the one
outstanding access the L1 is aware of, which means that the subsequent
accesses from the LSQ (forwarded after the first hit completes) would be
unnecessary misses.  You could fix this by hanging oin to the last block in
the cache even after acking the invalidation as long as you had additional
accesses to the same block (actually discarding the block only on the first
access to a different block), but that would not simplify the cache access
code path :).

Second, I'm confused about how this would help, given that we need to do
coalescing at any shared downstream (e.g., L2) cache anyway, to combine
multiple accesses to the same block from different upstream (e.g., L1)
caches.  If pushing coalescing up into the LSQ doesn't let us remove it
from the cache model, it doesn't seem like a big win to me.  I understand
that it would be a win on the Ruby side, given the current situation there,
but that raises the question in my mind of how Ruby deals with multiple
coalescable accesses to shared downstream caches itself, and why can't that
solution be applied to the Ruby L1 caches?

Thanks,

Steve
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to