Hi Brad, It's reasonable that you prefer to work with your existing changes. Could you resubmit the aforementioned patch for review to get it merged into upstream? This would set a stable starting point from which we can make further improvements.
Thanks in advance Timothy ________________________________ From: gem5-dev <[email protected]> on behalf of Beckmann, Brad <[email protected]> Sent: 24 September 2019 17:48 To: gem5 Developer List <[email protected]> Subject: Re: [gem5-dev] Ruby Sequencer starving O3 core? Hi Timothy, As Jason said, this is a known issue. In fact we tried to fix it many years ago in the public tree but we had difficulty getting the patch approved and eventually discarded our effort. http://reviews.gem5.org/r/2276/ We still have this patch applied to our internal tree and it works quite well. Another key change along these same lines is splitting the store address request from the store data request. Unfortunately that patch has never made it out of our internal tree and we need to find someone within AMD to maintain it before we can push it publicly. There are a few things to keep in mind when thinking about how to improve the CPU/GPU to Ruby interface. The Sequencer and Coalescer implement protocol agnostic logic, such as address coalescing and request tracking. We could move this logic into the L1 cache controllers, but that would require duplicating that work in each controller and further complicating the already complicated state transitions. Furthermore, the generated protocol state machines only operate on cache-line aligned addresses, but all gem5 CPU and GPU core models send byte-aligned address to their ports. Thus the Sequencer and Coalescer are in charge of managing the byte to cache-line address conversion. I hope this helps and let us know how you want to proceed. Thanks, Brad -----Original Message----- From: gem5-dev <[email protected]> On Behalf Of Jason Lowe-Power Sent: Tuesday, September 24, 2019 8:11 AM To: gem5 Developer List <[email protected]> Subject: Re: [gem5-dev] Ruby Sequencer starving O3 core? [CAUTION: External Email] Hi Timothy, The short answer is that this is a quasi-known issue. The interface between the core and Ruby needs to be improved. (It's on the roadmap! Though, no one is actively working on it.) I could be wrong myself, but I believe you're correct that Ruby cannot handle multiple loads to the same cache block. I believe in previous incarnations of the simulator that the coalescing into cache blocks happened in the LSQ. However, the classic caches assume this happens in the cache creating a mis-match between Ruby and the classic caches. I'm not sure what the best fix for this is. Unless it's a small change, we should probably discuss the design with Brad and Tony before putting significant effort into coding. Cheers, Jason On Sun, Sep 22, 2019 at 3:20 PM Timothy Hayes <[email protected]> wrote: > I'm experimenting with various O3 configurations combined with Ruby's > MESI_Three_Level memory subsystem. I notice that it's very challenging > to provide the core with more memory bandwidth. For typical/realistic > O3/Ruby/memory parameters, a single core struggles to achieve 3000 > MB/s in STREAM. If I max out all the parameters of the O3 core, Ruby, > the NoC and provide a lot of memory bandwidth, STREAM just about > reaches 6000 MB/s. I believe this should be much higher. I've found > one possible explanation for this behaviour. > > The Ruby Sequencer receives memory requests from the core via the > function Sequencer::insertRequest(PacketPtr pkt, RubyRequestType > request_type). This function determine whether there are requests to > the same cache line and--if there are--returns without enqueing the > memory request. This also happens with load requests in which there is > already an outstanding load request to the same cache line. > > RequestTable::value_type default_entry(line_addr, (SequencerRequest*) > NULL); pair<RequestTable::iterator, bool> r = > m_readRequestTable.insert(default_entry); > > if (r.second) { > /* snip */ > } else { > // There is an outstanding read request for the cache line > m_load_waiting_on_load++; > return RequestStatus_Aliased; > } > > This eventually returns to the LSQ which interprets the Aliased > RequestStatus as the cache controller being blocked. > > bool LSQUnit<Impl>::trySendPacket(bool isLoad, PacketPtr data_pkt) { > if (!lsq->cacheBlocked() && > lsq->cachePortAvailable(isLoad)) { > if (!dcachePort->sendTimingReq(data_pkt)) { > ret = false; > cache_got_blocked = true; > } > } > if (cache_got_blocked) { > lsq->cacheBlocked(true); > ++lsqCacheBlocked; > } > } > > If the code is generating many load requests to contigious memory, > e.g. in STREAM, won't the cache get blocked extremely frequently? > Would this explain why it's so difficult to get the core to consume more > bandwidth? > > I'm happy to go ahead and fix/improve this, but I wanted to check > first that I'm not missing something--can Ruby handle multiple > outstanding loads to the same cache line without blocking the cache? > > > -- > > Timothy Hayes > > Senior Research Engineer > > Arm Research > > Phone: +44-1223405170 > > [email protected] > > > > > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose > the contents to any other person, use it for any purpose, or store or > copy the information in any medium. Thank you. > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
