Korey Sewell wrote: > OK.... So I finally figured out what's wrong here and admittedly I'm a > bit apprehensive of the universal (all ISA) fix will be but we'll see. > > So let me explain the situation from the top: > > (1) In the Decode of O3, an unconditional pc-relative branch is > resolved and the prediciton history table is updated to remove the > resolved branch (since we've already updated the predictors with the > result). However, in the Execute stage the branch is seen as > mispredicted AGAIN, causing the code to want to update and delete > something from the prediction history. This triggers an assert and > breaks exeuction. > > (2) So a few questions emerge: > (2a) Why does this work with the previous hack disabling the early > resolve of branches in Decode? > Because the mispredict only happens once and in Execute the fetch is > redirected to the results of the execution so we only experience > performance degradation instead of non-functional peformance. > > (2b) How can we mispredict a branch in Execute after we've previously > resolved that branch in Decode? > It just seems that the mispredict() function isnt handling the MIPS > delay slots correctly. When a branch happens in MIPS, it sets the NPC > = PC + 4 and NNPC = target. However the code shows this: > > return readPredPC() != readNextPC() || > readPredNPC() != readNextNPC() || > readPredMicroPC() != readNextMicroPC(); > > Consider for MIPS, the predPC is the target and the predNPC is > target+4, then you're saying: > target != pc+4 > and > target+4 != target > which will be wrong in most cases. >
You're not reading this right. predPC is the predicted PC of the -next- instruction, and predNPC is the predicted NPC of the -next- instruction. So in MIPS predPC would -not- be the target, it would be PC + 4 like you said. The ability of the predictor to predict two PCs is a hack and should be redone to actually consider both PCs from the start, but it's not blatantly wrong. Any place you see a pred*, you can switch "pred" with "next" and get it's non-speculative pair. It wouldn't make any sense to speculate on the PC of the current instruction :). > (3) How could this work for SPARC? > I'm not sure. My guess is that instructions in SPARC are disregarding > the delay slot in execution and setting the NPC to the target instead > of setting the NPC to PC+4. > No. If SPARC ignored or misimplemented the branch delay slots it would break horribly in most cases. I spent a lot of time dealing with all the little wrinkles and corner cases of branch delay slots and annull bits in O3, and even if it's a -tiny- bit off, something will probably explode, or if your lucky just crash the benchmark. > SIDE ISSUE: Gabe, is SPARC not using the Unconditional Control flag? I > ran SPARC and put this assert (assert(!inst->UnCondCtrl()) in the > decode-branch-resolve optimization and it ran fine (for hello-world) > Yes (I think). SPARC does not use the Unconditional Control flag. I think the underlying bug is that both decode and execute try to clean up after the same mispredicted branch. The simple (sounding) solution is to make sure execute recognizes when decode has already handled it and then just let it go by, or do whatever other clean up might need to be done in that special case. You could check for the Unconditinal Control flag for instance. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
