Hi All,

At the bottom of Steve's message, he wrote " If pushing coalescing up into the 
LSQ doesn't let us remove it from the cache model, it doesn't seem like a big 
win to me.  I understand that it would be a win on the Ruby side, given the 
current situation there, but that raises the question in my mind of how Ruby 
deals with multiple coalescable accesses to shared downstream caches itself, 
and why can't that solution be applied to the Ruby L1 caches?"

I just want to clarify that I do not want to push the coalescing into the LSQ.  
The packets generated from the CPU have always been per-scalar-instruction, 
per-request and per-address.  I personally want to keep them this way.  The 
Ruby Sequencer contains protocol-independent logic that is part of the L1 
cache, such as coalescing from a single thread.

To answer Steve's question, Ruby allows coalescing in downstream shared caches, 
but it is protocol dependent.  The protocols can dictate what coalescing from 
multiple threads is permissible (for instance multiple writes can merge in 
weaker memory models, but not stronger ones).  Thus it is not something we can 
do in a protocol-independent fashion.

Brad


-----Original Message-----
From: gem5-dev [mailto:[email protected]] On Behalf Of Steve Reinhardt
Sent: Sunday, September 13, 2015 5:51 PM
To: gem5 Developer List; Gutierrez, Anthony; Andreas Hansson; Joel Hestness
Subject: [gem5-dev] O3 coalescing (was Re: Review Request 2787: ruby: Fixed 
pipeline squashes caused by aliased requests)

On Sat, Sep 12, 2015 at 7:31 AM Andreas Hansson <[email protected]>
wrote:

>
>
> > On Sept. 12, 2015, 1:54 p.m., Andreas Hansson wrote:
> > > My comment seems to have gotten lost in all the different threads
> going on...bad S/N. Anyways, here it is:
> > >
> > > I am of the opinion that we should probably 1) do read/write 
> > > combining
> in the core LSQ before sending out a packet, and 2) combining of MSHR 
> targets in the L1 before propagating a miss downwards. I am not sure 
> why we would ever do it in the Sequencer. Am I missing something?
> > >
> > > This solution would also translate very well between both Ruby and 
> > > the
> classic memory system.
> >
> > Joel Hestness wrote:
> >     Hi Andreas. I think we all agree with you about where coalescing
> should happen. It appears that (1) is available from particular cores (e.g.
> the O3 CPU). The problem currently is that getting (2) in Ruby would 
> require very non-trivial modification to the L1 controllers (to do
> combining) in each of the existing protocols (7 in gem5 + at least 2 
> not yet in gem5). To avoid all this protocol modification, this patch 
> is AMD's proposal to do L1 MSHR-like combining within the sequencer. 
> This proposed solution should be viewed as a stopgap on the road to 
> MSHR combining in the
> L1 controllers.
>
> I see. Thanks for the clarification.
>
> I am fairly convinced the O3 CPU is not doing any coalescing at the 
> moment. Are you sure? If not I'd say that's the place to start. 
> Coalescing in the LSQ is probably as important as coalescing of MSHR 
> targets, if not more so.
>

Forking this off into a separate thread since it's not even one of the issues 
that Joel brought up when he tried to organize things...

Andreas, can you clarify more what you mean by adding coalescing in the O3 LSQ? 
I'm not sure exactly what you mean, and I have some concerns about how that 
would work and also what it would save us.

First, are you talking about having the LSQ track memory accesses at 
cache-block granularity, and only forward a single access per cache block to 
the L1 cache?  What happens then when I have multiple read accesses to a cache 
block, but the block arrives with a pending invalidation?  In the current cache 
we would invalidate the block after completing the one outstanding access the 
L1 is aware of, which means that the subsequent accesses from the LSQ 
(forwarded after the first hit completes) would be unnecessary misses.  You 
could fix this by hanging oin to the last block in the cache even after acking 
the invalidation as long as you had additional accesses to the same block 
(actually discarding the block only on the first access to a different block), 
but that would not simplify the cache access code path :).

Second, I'm confused about how this would help, given that we need to do 
coalescing at any shared downstream (e.g., L2) cache anyway, to combine 
multiple accesses to the same block from different upstream (e.g., L1) caches.  
If pushing coalescing up into the LSQ doesn't let us remove it from the cache 
model, it doesn't seem like a big win to me.  I understand that it would be a 
win on the Ruby side, given the current situation there, but that raises the 
question in my mind of how Ruby deals with multiple coalescable accesses to 
shared downstream caches itself, and why can't that solution be applied to the 
Ruby L1 caches?

Thanks,

Steve
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to