Hi Timothy,

The short answer is that this is a quasi-known issue. The interface between
the core and Ruby needs to be improved. (It's on the roadmap! Though, no
one is actively working on it.)

I could be wrong myself, but I believe you're correct that Ruby cannot
handle multiple loads to the same cache block. I believe in previous
incarnations of the simulator that the coalescing into cache blocks
happened in the LSQ. However, the classic caches assume this happens in the
cache creating a mis-match between Ruby and the classic caches.

I'm not sure what the best fix for this is. Unless it's a small change, we
should probably discuss the design with Brad and Tony before putting
significant effort into coding.

Cheers,
Jason

On Sun, Sep 22, 2019 at 3:20 PM Timothy Hayes <[email protected]> wrote:

> I'm experimenting with various O3 configurations combined with Ruby's
> MESI_Three_Level memory subsystem. I notice that it's very challenging to
> provide the core with more memory bandwidth. For typical/realistic
> O3/Ruby/memory parameters, a single core struggles to achieve 3000 MB/s in
> STREAM. If I max out all the parameters of the O3 core, Ruby, the NoC and
> provide a lot of memory bandwidth, STREAM just about reaches 6000 MB/s. I
> believe this should be much higher. I've found one possible explanation for
> this behaviour.
>
> The Ruby Sequencer receives memory requests from the core via the function
> Sequencer::insertRequest(PacketPtr pkt, RubyRequestType request_type). This
> function determine whether there are requests to the same cache line
> and--if there are--returns without enqueing the memory request. This also
> happens with load requests in which there is already an outstanding load
> request to the same cache line.
>
> RequestTable::value_type default_entry(line_addr, (SequencerRequest*)
> NULL);
> pair<RequestTable::iterator, bool> r  =
> m_readRequestTable.insert(default_entry);
>
> if (r.second) {
>     /* snip */
> } else {
>     // There is an outstanding read request for the cache line
>     m_load_waiting_on_load++;
>     return RequestStatus_Aliased;
> }
>
> This eventually returns to the LSQ which interprets the Aliased
> RequestStatus as the cache controller being blocked.
>
> bool LSQUnit<Impl>::trySendPacket(bool isLoad, PacketPtr data_pkt)
> {
>     if (!lsq->cacheBlocked() &&
>         lsq->cachePortAvailable(isLoad)) {
>         if (!dcachePort->sendTimingReq(data_pkt)) {
>             ret = false;
>             cache_got_blocked = true;
>         }
>      }
>      if (cache_got_blocked) {
>          lsq->cacheBlocked(true);
>         ++lsqCacheBlocked;
>      }
> }
>
> If the code is generating many load requests to contigious memory, e.g. in
> STREAM, won't the cache get blocked extremely frequently? Would this
> explain why it's so difficult to get the core to consume more bandwidth?
>
> I'm happy to go ahead and fix/improve this, but I wanted to check first
> that I'm not missing something--can Ruby handle multiple outstanding loads
> to the same cache line without blocking the cache?
>
>
> --
>
> Timothy Hayes
>
> Senior Research Engineer
>
> Arm Research
>
> Phone: +44-1223405170
>
> [email protected]
>
>
> ​
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose the
> contents to any other person, use it for any purpose, or store or copy the
> information in any medium. Thank you.
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to