Brad,
As long as we use multiple ports only to fetch coherence permissions and
only one store is performed at a time, it is intutively clear to me that
SC and TSO can be implemented. But if we implement this, it might mean
forgoing the Alpha-like memory model that we have in place right now. This
goes back to my earlier question on what memory model(s) are we interested
in? Do we prefer co-existence of multiple memory models?
--
Nilay
On Wed, 9 Nov 2011, Beckmann, Brad wrote:
Hi Nilay,
With regards to your question about how to allow multiple simultaneous
stores, do you not believe my second and third proposals achieve that?
As I stated before, I don't think we need to make any fundamental
changes to Ruby. We just need to provide the correct information and
interfaces to the LSQ/Store Buffer.
Brad
-----Original Message-----
From: Nilay Vaish [mailto:[email protected]]
Sent: Tuesday, November 08, 2011 6:12 PM
To: Beckmann, Brad
Cc: Default; Mark D. Hill
Subject: RE: Review Request: Forward invalidations from Ruby to O3 CPU
On Wed, 2 Nov 2011, Nilay Vaish wrote:
On Fri, 28 Oct 2011, Beckmann, Brad wrote:
Let’s move this conversation to just the email thread.
I suspect we may be talking past each other, so let’s talk about the
complete implementations not just Ruby. There are multiple ways one
can
implement the store portion of x86-TSO. I’m not sure what the O3
model
does, but here are a few possibilities:
- Do not issue any part of the store to the memory system when the
instruction is executed. Instead, simply buffer it in the LSQ until
the
instruction retires, then buffer in the store buffer after
retirement. Only
when the store reaches the head of the store buffer, issue it to
Ruby. The
next store is not issued to Ruby until the previous store head
completes,
maintaining correct store ordering.
- Do not issue any part of the store to the memory system when the
instruction is executed. Instead, simply buffer it in the LSQ until
the
instruction retires. Once it retires and enters the store buffer
and we
issue the address request to Ruby (no L1 data update). Ruby
forwards
probes/replacemetns to the store buffer and if the store buffer sees
a
probe/replacement to an address who’s address request has already
completed, the store buffer reissues the request. Once the store
reaches
the head of the store buffer, double check with Ruby that write
permissions
still exist in the L1.
- Issue the store address (no L1 data update) to Ruby when the
instruction
is executed. When it retires, it enters the store buffer. Ruby
forwards
probes/replacemetns to the LSQ+store buffer and if either sees a
probe/replacement to an address who’s address request has already
completed, the request reissues (several policies exist on when to
reissue
the request). Once the store reaches the head of the store buffer,
double
check with Ruby that write permissions still exist in the L1.
Do those scenarios make sense to you? I believe we can implement
any one
of them without modifying Ruby’s core functionality. If you are
envisioning or if O3 implements something completely different,
please let
me know.
1. What's current memory model that the O3 CPU implements? Do we want
multiple memory models to co-exist? We might want to have both SC and
TSO,
though Alpha had a weaker model.
2. I think we should try to stick what the O3 CPU implements
currently,
meaning we should not change the stage when the store is issued to
the cache.
I am more concerned about how multiple ports get handled.
Looking at the trace generated by the toy application I use for testing
the O3 CPU and Ruby combination, I have been able to confirm my
suspicion
that stores can become visible to the rest of the system in an order
different from the program order.
It might be that the classic memory system does not allow stores to go
out
of order. Or that the initial implementation of the O3 CPU was for a
weaker memory model like that of Alpha architecture (Prof. Hill
suggested
that this might be the case).
Overall I am still not clear on how to make O3 and Ruby work together
correctly for SC or TSO, in case when multiple stores can be issued to
the memory system in parallel.
--
Nilay
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev