Re: [gem5-dev] Ruby Sequencer starving O3 core?

Timothy Hayes Tue, 24 Sep 2019 12:36:31 -0700

Hi Brad,

It's reasonable that you prefer to work with your existing changes. Could you 
resubmit the aforementioned patch for review to get it merged into upstream? 
This would set a stable starting point from which we can make further 
improvements.

Thanks in advance
Timothy
________________________________
From: gem5-dev <[email protected]> on behalf of Beckmann, Brad 
<[email protected]>
Sent: 24 September 2019 17:48
To: gem5 Developer List <[email protected]>
Subject: Re: [gem5-dev] Ruby Sequencer starving O3 core?

Hi Timothy,

As Jason said, this is a known issue.  In fact we tried to fix it many years 
ago in the public tree but we had difficulty getting the patch approved and 
eventually discarded our effort.

http://reviews.gem5.org/r/2276/

We still have this patch applied to our internal tree and it works quite well.  
Another key change along these same lines is splitting the store address 
request from the store data request.  Unfortunately that patch has never made 
it out of our internal tree and we need to find someone within AMD to maintain 
it before we can push it publicly.

There are a few things to keep in mind when thinking about how to improve the 
CPU/GPU to Ruby interface.  The Sequencer and Coalescer implement protocol 
agnostic logic, such as address coalescing and request tracking.  We could move 
this logic into the L1 cache controllers, but that would require duplicating 
that work in each controller and further complicating the already complicated 
state transitions.  Furthermore, the generated protocol state machines only 
operate on cache-line aligned addresses, but all gem5 CPU and GPU core models 
send byte-aligned address to their ports.  Thus the Sequencer and Coalescer are 
in charge of managing the byte to cache-line address conversion.

I hope this helps and let us know how you want to proceed.

Thanks,

Brad

-----Original Message-----
From: gem5-dev <[email protected]> On Behalf Of Jason Lowe-Power
Sent: Tuesday, September 24, 2019 8:11 AM
To: gem5 Developer List <[email protected]>
Subject: Re: [gem5-dev] Ruby Sequencer starving O3 core?

[CAUTION: External Email]

Hi Timothy,

The short answer is that this is a quasi-known issue. The interface between the 
core and Ruby needs to be improved. (It's on the roadmap! Though, no one is 
actively working on it.)

I could be wrong myself, but I believe you're correct that Ruby cannot handle 
multiple loads to the same cache block. I believe in previous incarnations of 
the simulator that the coalescing into cache blocks happened in the LSQ. 
However, the classic caches assume this happens in the cache creating a 
mis-match between Ruby and the classic caches.

I'm not sure what the best fix for this is. Unless it's a small change, we 
should probably discuss the design with Brad and Tony before putting 
significant effort into coding.

Cheers,
Jason

On Sun, Sep 22, 2019 at 3:20 PM Timothy Hayes <[email protected]> wrote:

> I'm experimenting with various O3 configurations combined with Ruby's
> MESI_Three_Level memory subsystem. I notice that it's very challenging
> to provide the core with more memory bandwidth. For typical/realistic
> O3/Ruby/memory parameters, a single core struggles to achieve 3000
> MB/s in STREAM. If I max out all the parameters of the O3 core, Ruby,
> the NoC and provide a lot of memory bandwidth, STREAM just about
> reaches 6000 MB/s. I believe this should be much higher. I've found
> one possible explanation for this behaviour.
>
> The Ruby Sequencer receives memory requests from the core via the
> function Sequencer::insertRequest(PacketPtr pkt, RubyRequestType
> request_type). This function determine whether there are requests to
> the same cache line and--if there are--returns without enqueing the
> memory request. This also happens with load requests in which there is
> already an outstanding load request to the same cache line.
>
> RequestTable::value_type default_entry(line_addr, (SequencerRequest*)
> NULL); pair<RequestTable::iterator, bool> r  =
> m_readRequestTable.insert(default_entry);
>
> if (r.second) {
>     /* snip */
> } else {
>     // There is an outstanding read request for the cache line
>     m_load_waiting_on_load++;
>     return RequestStatus_Aliased;
> }
>
> This eventually returns to the LSQ which interprets the Aliased
> RequestStatus as the cache controller being blocked.
>
> bool LSQUnit<Impl>::trySendPacket(bool isLoad, PacketPtr data_pkt) {
>     if (!lsq->cacheBlocked() &&
>         lsq->cachePortAvailable(isLoad)) {
>         if (!dcachePort->sendTimingReq(data_pkt)) {
>             ret = false;
>             cache_got_blocked = true;
>         }
>      }
>      if (cache_got_blocked) {
>          lsq->cacheBlocked(true);
>         ++lsqCacheBlocked;
>      }
> }
>
> If the code is generating many load requests to contigious memory,
> e.g. in STREAM, won't the cache get blocked extremely frequently?
> Would this explain why it's so difficult to get the core to consume more 
> bandwidth?
>
> I'm happy to go ahead and fix/improve this, but I wanted to check
> first that I'm not missing something--can Ruby handle multiple
> outstanding loads to the same cache line without blocking the cache?
>
>
> --
>
> Timothy Hayes
>
> Senior Research Engineer
>
> Arm Research
>
> Phone: +44-1223405170
>
> [email protected]
>
>
> 
>
>
> IMPORTANT NOTICE: The contents of this email and any attachments are
> confidential and may also be privileged. If you are not the intended
> recipient, please notify the sender immediately and do not disclose
> the contents to any other person, use it for any purpose, or store or
> copy the information in any medium. Thank you.
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Ruby Sequencer starving O3 core?

Reply via email to