Hi All, At the bottom of Steve's message, he wrote " If pushing coalescing up into the LSQ doesn't let us remove it from the cache model, it doesn't seem like a big win to me. I understand that it would be a win on the Ruby side, given the current situation there, but that raises the question in my mind of how Ruby deals with multiple coalescable accesses to shared downstream caches itself, and why can't that solution be applied to the Ruby L1 caches?"
I just want to clarify that I do not want to push the coalescing into the LSQ. The packets generated from the CPU have always been per-scalar-instruction, per-request and per-address. I personally want to keep them this way. The Ruby Sequencer contains protocol-independent logic that is part of the L1 cache, such as coalescing from a single thread. To answer Steve's question, Ruby allows coalescing in downstream shared caches, but it is protocol dependent. The protocols can dictate what coalescing from multiple threads is permissible (for instance multiple writes can merge in weaker memory models, but not stronger ones). Thus it is not something we can do in a protocol-independent fashion. Brad -----Original Message----- From: gem5-dev [mailto:[email protected]] On Behalf Of Steve Reinhardt Sent: Sunday, September 13, 2015 5:51 PM To: gem5 Developer List; Gutierrez, Anthony; Andreas Hansson; Joel Hestness Subject: [gem5-dev] O3 coalescing (was Re: Review Request 2787: ruby: Fixed pipeline squashes caused by aliased requests) On Sat, Sep 12, 2015 at 7:31 AM Andreas Hansson <[email protected]> wrote: > > > > On Sept. 12, 2015, 1:54 p.m., Andreas Hansson wrote: > > > My comment seems to have gotten lost in all the different threads > going on...bad S/N. Anyways, here it is: > > > > > > I am of the opinion that we should probably 1) do read/write > > > combining > in the core LSQ before sending out a packet, and 2) combining of MSHR > targets in the L1 before propagating a miss downwards. I am not sure > why we would ever do it in the Sequencer. Am I missing something? > > > > > > This solution would also translate very well between both Ruby and > > > the > classic memory system. > > > > Joel Hestness wrote: > > Hi Andreas. I think we all agree with you about where coalescing > should happen. It appears that (1) is available from particular cores (e.g. > the O3 CPU). The problem currently is that getting (2) in Ruby would > require very non-trivial modification to the L1 controllers (to do > combining) in each of the existing protocols (7 in gem5 + at least 2 > not yet in gem5). To avoid all this protocol modification, this patch > is AMD's proposal to do L1 MSHR-like combining within the sequencer. > This proposed solution should be viewed as a stopgap on the road to > MSHR combining in the > L1 controllers. > > I see. Thanks for the clarification. > > I am fairly convinced the O3 CPU is not doing any coalescing at the > moment. Are you sure? If not I'd say that's the place to start. > Coalescing in the LSQ is probably as important as coalescing of MSHR > targets, if not more so. > Forking this off into a separate thread since it's not even one of the issues that Joel brought up when he tried to organize things... Andreas, can you clarify more what you mean by adding coalescing in the O3 LSQ? I'm not sure exactly what you mean, and I have some concerns about how that would work and also what it would save us. First, are you talking about having the LSQ track memory accesses at cache-block granularity, and only forward a single access per cache block to the L1 cache? What happens then when I have multiple read accesses to a cache block, but the block arrives with a pending invalidation? In the current cache we would invalidate the block after completing the one outstanding access the L1 is aware of, which means that the subsequent accesses from the LSQ (forwarded after the first hit completes) would be unnecessary misses. You could fix this by hanging oin to the last block in the cache even after acking the invalidation as long as you had additional accesses to the same block (actually discarding the block only on the first access to a different block), but that would not simplify the cache access code path :). Second, I'm confused about how this would help, given that we need to do coalescing at any shared downstream (e.g., L2) cache anyway, to combine multiple accesses to the same block from different upstream (e.g., L1) caches. If pushing coalescing up into the LSQ doesn't let us remove it from the cache model, it doesn't seem like a big win to me. I understand that it would be a win on the Ruby side, given the current situation there, but that raises the question in my mind of how Ruby deals with multiple coalescable accesses to shared downstream caches itself, and why can't that solution be applied to the Ruby L1 caches? Thanks, Steve _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
