Hi Nilay,
Sorry for late response, I din't check my emails since last night :).
Anyway, so the checkviolations part that we are talking about, that takes care
of not having any CMP violation of coherence, but it does not re-execute a load
(not at the front of the commit queue) and following younger insts upon
receiving a snoop invalidation request, so in my understanding it does not
enforce the strict load-load ordering of a stronger model. So i add couple of
lines in checkSnoop: see the changes below
(1) the first if clause of checking the " // If there are no loads in the LSQ
we don't care" condition was wrong i guess in the existing code, it actually
was checking"If there are no loads in the LSQ we don't care" with the "if
(load_idx == loadTail)" clause. So with an additional if clause, I make sure
that if the snoop hits the front of the load queue, then nothing need to be
done.
(2) further I add a clause towards the end of checkSnoop () with needSC
condition to check, if the snoop hits a executed load that is not at the front
of the queue, reexecutes using ReExec (hopefully ReExec squashs all the younger
insts including that and re-fetches, as i understood from Ali's response)
The other changes that I did to maintain SC is to add few more constraints on
the load queue to ensure store-load ordering, ie a load in the load queue can
not retire from ROB until and unless the committed store instructions before
that in the program order are exposed to the memory system, as a result a load
can still receive snoop invalidates and need to be re-executed, if needed. I
can post my changes to enforce SC for review.
template <class Impl>
void
LSQUnit<Impl>::checkSnoop(PacketPtr pkt)
{
int load_idx = loadHead;
if (!cacheBlockMask) {
assert(dcachePort);
Addr bs = dcachePort->peerBlockSize();
// Make sure we actually got a size
assert(bs != 0);
cacheBlockMask = ~(bs - 1);
}
// If there are no loads in the LSQ we don't care
if (load_idx == loadTail) {
DPRINTF(LSQUnit, "loadHead: %d, loadTail:%d\n", loadHead, loadTail);
//assert(0);
return;
}
// If this is the only load in the LSQ we don't care
if (loadTail == (load_idx + 1)) {
DPRINTF(LSQUnit, "loadHead: %d, loadTail:%d\n", loadHead, loadTail);
//assert(0);
return;
}
incrLdIdx(load_idx);
DPRINTF(LSQUnit, "Got snoop for address %#x\n", pkt->getAddr());
Addr invalidate_addr = pkt->getAddr() & cacheBlockMask;
while (load_idx != loadTail) {
DynInstPtr ld_inst = loadQueue[load_idx];
if (!ld_inst->effAddrValid || ld_inst->uncacheable()) {
incrLdIdx(load_idx);
continue;
}
Addr load_addr = ld_inst->physEffAddr & cacheBlockMask;
DPRINTF(LSQUnit, "-- inst [sn:%lli] load_addr: %#x to pktAddr:%#x\n",
ld_inst->seqNum, load_addr, invalidate_addr);
if (load_addr == invalidate_addr) {
if (ld_inst->possibleLoadViolation) {
DPRINTF(LSQUnit, "Conflicting load at addr %#x [sn:%lli]\n",
ld_inst->physEffAddr, pkt->getAddr(), ld_inst->seqNum);
// Mark the load for re-execution
ld_inst->fault = new ReExec;
} else {
// If a older load checks this and it's true
// then we might have missed the snoop
// in which case we need to invalidate to be sure
ld_inst->hitExternalSnoop = true;
if (needsSC == true){
ld_inst->fault = new ReExec;
}
}
}
incrLdIdx(load_idx);
}
return;
}
On 07/12/12, Nilay Vaish wrote:
> Dibakar, any progress on this front?
>
> On Wed, 27 Jun 2012, Ali Saidi wrote:
>
> >
> >
> >Hi Dibakar,
> >
> >I'm not saying that I believe this is correct for x86.
> >It seems like x86 does require more ordering than is currently provided
> >by the lsq. Hopefully someone with more x86 experience could chime in
> >and confirm that. The faulting mechanism needs an overhaul in the o3
> >cpu. There shouldn't be any fundamental difference.
> >
> >Thanks,
> >
> >Ali
> >
> >On
> >27.06.2012 18:08, Dibakar Gope wrote:
> >
> >>Hi Ali,
> >>
> >>from this thread,
> >http://www.mail-archive.com/[email protected]/msg00782.html, I get an
> >idea that a snoop invalidate will make a younger load and its following
> >younger instructions to re-execute, if only an older load in the program
> >order to the same cache block see an updated value. But I am not still
> >sure, if it obeys the load-load ordering of a stronger consistency model
> >other than ARM. Suppose for example,
> >>C0 C1
> >>St A Ld C
> >>St B Ld A
> >>
> >
> >>In the above scenario, if the memory order becomes Ld A -> St A -> St
> >B -> Ld C and if C1 receives an invalidation for cache block A, before
> >Ld A make it to the front of the commit queue, still checkViolations()
> >code won't squash the Ld A and any younger instructions to maintain
> >strong consistency.
> >>
> >>My other doubt is that, can we make use of the
> >squashDueToMemOrder() squash mechanism instead of using ReExec fault, if
> >I want to squash the load A and younger instructions and re-fetch those
> >again in the above scenario? ReExec waits for the faulted instruction to
> >reach the front of the commit, is there any other fundamental difference
> >of using ReExec in comparison to the squashDueToMemOrder() other than
> >this?
> >>
> >>Thanks,
> >>--Dibakar
> >>
> >>On 06/25/12, Ali Saidi wrote:
> >>
> >>>
> >ARM just requires load-load ordering (which is stronger than alpha). x86
> >to my knowledge requires all stores in the system to be visible in the
> >same order. Ali On Jun 22, 2012, at 11:50 PM, Nilay wrote:
> >>>
> >>>>
> >What's the difference between ARM's load-load ordering and TSO? I am
> >guessing in ARM not all instructions are flushed from pipe, but only
> >those that are affected by the snoop. My understanding is that the O3
> >CPU flushes the entire pipeline when it sees that an instruction needs
> >to execute again. Since instructions commit inorder, any load that gets
> >squashed would mean that all subsequent loads are squashed as well. --
> >Nilay On Fri, June 22, 2012 8:47 am, Ali Saidi wrote:
> >>>>
> >>>>>HI
> >Dibakar, I'd have to think carefully about it, but you may be right
> >about TSO. I'd hope that someone who is more familiar with x86 could
> >respond. Thanks, Ali On 22.06.2012 07:46, Dibakar Gope wrote:
> >>>>>
> >
> >>>>>>Hi Ali, Thanks for the response. Ok, I got the point. I
> >>>>>
> >thought that since the O3 attempts to support the TSO for X86 , so
> >inherently this enforces/covers the regular load-load ordering present
> >in any stronger consistency model. But if it inline with ARM's
> >requirements,then does it not violate x86 and TSO's conventional
> >load-load ordering?
> >>>>>
> >>>>>>thanks, Dibakar
> >>>>
> >_______________________________________________ gem5-users mailing list
> >[email protected] [1]
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [2]
> >>>
> >_______________________________________________ gem5-users mailing list
> >[email protected] [3]
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users [4]
> >>
> >>
> >_______________________________________________
> >>gem5-users mailing
> >list
> >>[email protected]
> >>
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >
> >
> >
> >
> >Links:
> >------
> >[1] mailto:[email protected](javascript:main.compose()
> >[2]
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >[3]
> >mailto:[email protected](javascript:main.compose()
> >[4]
> >http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users