Hey Korey, I saw the thread from back than and changed the architecture according to it using also as reference the 9 stage pipeline_traits.cc.
Yuval. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Korey Sewell Sent: Monday, July 09, 2012 7:57 PM To: gem5 users mailing list Subject: Re: [gem5-users] Branch prediction issue in InOrder CPU Hi, When you updated your pipeline to 12 stages, did you also update the per-pipeline instruction schedule? Should be createFrontSked()/createBackSked() functions in cpu.cc. There was a thread on this a while back that I replied to. In the event, that doesnt work, give me a day or two to look at the trace a little more closely and speculate on fix. On Sun, Jul 8, 2012 at 10:10 AM, Ali Saidi <[email protected]> wrote: > It might be Korey, any thoughts? > > I'm not that familiar with the in order cpu, but it either needs to > store the predicted address or one previous to it in all cases. My > only worry with this change is that with the RAS is used you'll end up > unnecessarily incrementing the PC and mispredicting there. > > Ali > > On Jul 5, 2012, at 5:48 AM, Yuval H. Nacson wrote: > > Hello, > > I tried to run a very simple program in the InOrder environment. > I extended the pipe to 12 stages, and also moved the prediction and > fetch update to the first stage. > > j1: subl $2, 1, $2; > addl $1, 4, $1; > bgt $2, j1; > > This loop runs for a million iterations. > The cpi for this part is expected to be 1 or very close to, but it is 4. > > Looking in the debug flags I see the following: > Cycle n: > 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002] requesting > this resource. > 9522195: system.cpu.branch_predictor: [tid:0] [sn:1002] bge > r18,0x120008220 ... PC (0x120008238=>0x12000823c) doing branch > prediction > 9522195: system.cpu.branch_predictor: [tid:0]: [asid:0] Instruction > (0x120008238=>0x12000823c) predicted target is (0x120008238=>0x120008220). > 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: Setting > Predicted PC to (0x120008238=>0x120008220). > 9522195: system.cpu.branch_predictor: [tid:0] [sn:1002] pushed onto > front of predHist ...predHist.size(): 1 > 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: Branch > predicted true. > 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: bge > Predicted PC is (0x120008238=>0x120008220). > 9522195: system.cpu.branch_predictor[slot:0]:: done with request from > [sn:1002] [tid:0]. > 9522195: system.cpu.stage0: [tid:0]: [sn:1002] request to > system.cpu.branch_predictor completed. > 9522195: system.cpu.branch_predictor: Deallocating [slot:0]. > 9522195: system.cpu.stage0: [tid:0]: [sn:1002]: sending request to > system.cpu.fetch_seq_unit. > 9522195: system.cpu.fetch_seq_unit: Allocating [slot:1] for [tid:0]: > [sn:1002] > 9522195: system.cpu.fetch_seq_unit: [tid:0]: [sn:1002] requesting this > resource. > 9522195: system.cpu.fetch_seq_unit: [tid:0] Setting up squash to start > from stage 0, after [sn:0]. > 9522195: system.cpu.fetch_seq_unit[slot:1]:: done with request from > [sn:1002] [tid:0]. > 9522195: system.cpu.stage0: [tid:0]: [sn:1002] request to > system.cpu.fetch_seq_unit completed. > 9522195: system.cpu.fetch_seq_unit: Deallocating [slot:1]. > 9522195: system.cpu.stage0: [tid:0]: Attempting to send instructions > to stage 1. > 9522195: system.cpu.stage0: [tid:0] 1 slots available in next stage buffer. > 9522195: system.cpu.stage0: [tid:0]: [sn:1002]: being placed into > index 0 of stage buffer 0 queue. > 9522195: system.cpu.stage0: Done Processing [tid:0] > 9522195: system.cpu.stage0: > > Cycle n+1: > 9522612: system.cpu.fetch_seq_unit: Allocating [slot:0] for [tid:0]: > [sn:1003] > 9522612: system.cpu.fetch_seq_unit: [tid:0]: instruction requesting > this resource. > 9522612: system.cpu.fetch_seq_unit: [tid:0]: Current PC is > (0x120008238=>0x120008220) > 9522612: system.cpu.fetch_seq_unit: [tid:0]: Assigning [sn:1003] to PC > (0x120008238=>0x120008220) > 9522612: system.cpu.fetch_seq_unit[slot:0]:: done with request from > [sn:1003] [tid:0]. > > I cleaned a lot of unnecessary messeges for clarity. But the Idea is > that the branch command does not update the fetch correctly. The > predictor has problems predicting the second command claiming that the > BTB missing an entry for this command and so the branch is predicted > as not taken > (reminder: this is the second consecutive appearance of this branch). > Later on stage 10 (out of 12 in my pipe) the second branch is being > flushed because of miss prediction. And it's happens all the time > causing my CPI to be 4 times it's calculated value. > > I located the problem in the bpred_unit.cc file. And after I receive a > prediction from the predictor I advance it making the fetch unit not > fetching the second branch. Now all is well. > > I attach my change (It's towards the end): > > bool > BPredUnit::predict(DynInstPtr &inst, TheISA::PCState &predPC, ThreadID > tid) { > // See if branch predictor predicts taken. > // If so, get its target addr either from the BTB or the RAS. > // Save off record of branch stuff so the RAS can be fixed > // up once it's done. > > using TheISA::MachInst; > > int asid = inst->asid; > bool pred_taken = false; > TheISA::PCState target; > > ++lookups; > DPRINTF(InOrderBPred, "[tid:%i] [sn:%i] %s ... PC %s doing branch " > "prediction\n", tid, inst->seqNum, > inst->staticInst->disassemble(inst->instAddr()), > inst->pcState()); > > > void *bp_history = NULL; > > if (inst->isUncondCtrl()) { > DPRINTF(InOrderBPred, "[tid:%i] Unconditional control.\n", > tid); > pred_taken = true; > // Tell the BP there was an unconditional branch. > BPUncond(bp_history); > > if (inst->isReturn() && RAS[tid].empty()) { > DPRINTF(InOrderBPred, "[tid:%i] RAS is empty, predicting " > "false.\n", tid); > pred_taken = false; > } > } else { > ++condPredicted; > > pred_taken = BPLookup(predPC.instAddr(), bp_history); > } > > PredictorHistory predict_record(inst->seqNum, predPC, pred_taken, > bp_history, tid); > > // Now lookup in the BTB or RAS. > if (pred_taken) { > if (inst->isReturn()) { > ++usedRAS; > > // If it's a function return call, then look up the address > // in the RAS. > TheISA::PCState rasTop = RAS[tid].top(); > target = TheISA::buildRetPC(inst->pcState(), rasTop); > > // Record the top entry of the RAS, and its index. > predict_record.usedRAS = true; > predict_record.RASIndex = RAS[tid].topIdx(); > predict_record.rasTarget = rasTop; > > assert(predict_record.RASIndex < 16); > > RAS[tid].pop(); > > DPRINTF(InOrderBPred, "[tid:%i]: Instruction %s is a return, " > "RAS predicted target: %s, RAS index: %i.\n", > tid, inst->pcState(), target, > predict_record.RASIndex); > } else { > ++BTBLookups; > > if (inst->isCall()) { > > RAS[tid].push(inst->pcState()); > > // Record that it was a call so that the top RAS entry can > // be popped off if the speculation is incorrect. > predict_record.wasCall = true; > > DPRINTF(InOrderBPred, "[tid:%i]: Instruction %s was a call" > ", adding %s to the RAS index: %i.\n", > tid, inst->pcState(), predPC, > RAS[tid].topIdx()); > } > > if (inst->isCall() && > inst->isUncondCtrl() && > inst->isDirectCtrl()) { > target = inst->branchTarget(); > } else if (BTB.valid(predPC.instAddr(), asid)) { > ++BTBHits; > > // If it's not a return, use the BTB to get the target addr. > target = BTB.lookup(predPC.instAddr(), asid); > > DPRINTF(InOrderBPred, "[tid:%i]: [asid:%i] Instruction %s " > "predicted target is %s.\n", > tid, asid, inst->pcState(), target); > } else { > DPRINTF(InOrderBPred, "[tid:%i]: BTB doesn't have a " > "valid entry, predicting false.\n",tid); > pred_taken = false; > } > } > } > > if (pred_taken) { > // Set the PC and the instruction's predicted target. > predPC = target; > TheISA::advancePC(predPC, inst->staticInst); ß > } > DPRINTF(InOrderBPred, "[tid:%i]: [sn:%i]: Setting Predicted PC to > %s.\n", > tid, inst->seqNum, predPC); > > predHist[tid].push_front(predict_record); > > DPRINTF(InOrderBPred, "[tid:%i] [sn:%i] pushed onto front of predHist " > "...predHist.size(): %i\n", > tid, inst->seqNum, predHist[tid].size()); > > return pred_taken; > } > > Does my story seems right and is my correction is in place? > > Thanks, > Yuval. > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users -- - Korey _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users _______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
