Hi Timothy, The short answer is that this is a quasi-known issue. The interface between the core and Ruby needs to be improved. (It's on the roadmap! Though, no one is actively working on it.)
I could be wrong myself, but I believe you're correct that Ruby cannot handle multiple loads to the same cache block. I believe in previous incarnations of the simulator that the coalescing into cache blocks happened in the LSQ. However, the classic caches assume this happens in the cache creating a mis-match between Ruby and the classic caches. I'm not sure what the best fix for this is. Unless it's a small change, we should probably discuss the design with Brad and Tony before putting significant effort into coding. Cheers, Jason On Sun, Sep 22, 2019 at 3:20 PM Timothy Hayes <[email protected]> wrote: > I'm experimenting with various O3 configurations combined with Ruby's > MESI_Three_Level memory subsystem. I notice that it's very challenging to > provide the core with more memory bandwidth. For typical/realistic > O3/Ruby/memory parameters, a single core struggles to achieve 3000 MB/s in > STREAM. If I max out all the parameters of the O3 core, Ruby, the NoC and > provide a lot of memory bandwidth, STREAM just about reaches 6000 MB/s. I > believe this should be much higher. I've found one possible explanation for > this behaviour. > > The Ruby Sequencer receives memory requests from the core via the function > Sequencer::insertRequest(PacketPtr pkt, RubyRequestType request_type). This > function determine whether there are requests to the same cache line > and--if there are--returns without enqueing the memory request. This also > happens with load requests in which there is already an outstanding load > request to the same cache line. > > RequestTable::value_type default_entry(line_addr, (SequencerRequest*) > NULL); > pair<RequestTable::iterator, bool> r = > m_readRequestTable.insert(default_entry); > > if (r.second) { > /* snip */ > } else { > // There is an outstanding read request for the cache line > m_load_waiting_on_load++; > return RequestStatus_Aliased; > } > > This eventually returns to the LSQ which interprets the Aliased > RequestStatus as the cache controller being blocked. > > bool LSQUnit<Impl>::trySendPacket(bool isLoad, PacketPtr data_pkt) > { > if (!lsq->cacheBlocked() && > lsq->cachePortAvailable(isLoad)) { > if (!dcachePort->sendTimingReq(data_pkt)) { > ret = false; > cache_got_blocked = true; > } > } > if (cache_got_blocked) { > lsq->cacheBlocked(true); > ++lsqCacheBlocked; > } > } > > If the code is generating many load requests to contigious memory, e.g. in > STREAM, won't the cache get blocked extremely frequently? Would this > explain why it's so difficult to get the core to consume more bandwidth? > > I'm happy to go ahead and fix/improve this, but I wanted to check first > that I'm not missing something--can Ruby handle multiple outstanding loads > to the same cache line without blocking the cache? > > > -- > > Timothy Hayes > > Senior Research Engineer > > Arm Research > > Phone: +44-1223405170 > > [email protected] > > > > > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended > recipient, please notify the sender immediately and do not disclose the > contents to any other person, use it for any purpose, or store or copy the > information in any medium. Thank you. > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
