Hi,
When you updated your pipeline to 12 stages, did you also update the
per-pipeline instruction schedule? Should be
createFrontSked()/createBackSked() functions in cpu.cc. There was a
thread on this a while back that I replied to.

In the event, that doesnt work, give me a day or two to look at the
trace a little more closely and speculate on  fix.

On Sun, Jul 8, 2012 at 10:10 AM, Ali Saidi <[email protected]> wrote:
> It might be… Korey, any thoughts?
>
> I'm not that familiar with the in order cpu, but it either needs to store
> the predicted address or one previous to it in all cases. My only worry with
> this change is that with the RAS is used you'll end up unnecessarily
> incrementing the PC and mispredicting there.
>
> Ali
>
> On Jul 5, 2012, at 5:48 AM, Yuval H. Nacson wrote:
>
> Hello,
>
> I tried to run a very simple program in the InOrder environment.
> I extended the pipe to 12 stages, and also moved the prediction and fetch
> update to the first stage.
>
> j1: subl $2, 1, $2;
> addl $1, 4, $1;
> bgt $2, j1;
>
> This loop runs for a million iterations.
> The cpi for this part is expected to be 1 or very close to, but it is 4.
>
> Looking in the debug flags I see the following:
> Cycle n:
> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002] requesting this
> resource.
> 9522195: system.cpu.branch_predictor: [tid:0] [sn:1002] bge
> r18,0x120008220 ... PC (0x120008238=>0x12000823c) doing branch prediction
> 9522195: system.cpu.branch_predictor: [tid:0]: [asid:0] Instruction
> (0x120008238=>0x12000823c) predicted target is (0x120008238=>0x120008220).
> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: Setting Predicted
> PC to (0x120008238=>0x120008220).
> 9522195: system.cpu.branch_predictor: [tid:0] [sn:1002] pushed onto front of
> predHist ...predHist.size(): 1
> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: Branch predicted
> true.
> 9522195: system.cpu.branch_predictor: [tid:0]: [sn:1002]: bge Predicted PC
> is (0x120008238=>0x120008220).
> 9522195: system.cpu.branch_predictor[slot:0]:: done with request from
> [sn:1002] [tid:0].
> 9522195: system.cpu.stage0: [tid:0]: [sn:1002] request to
> system.cpu.branch_predictor completed.
> 9522195: system.cpu.branch_predictor: Deallocating [slot:0].
> 9522195: system.cpu.stage0: [tid:0]: [sn:1002]: sending request to
> system.cpu.fetch_seq_unit.
> 9522195: system.cpu.fetch_seq_unit: Allocating [slot:1] for [tid:0]:
> [sn:1002]
> 9522195: system.cpu.fetch_seq_unit: [tid:0]: [sn:1002] requesting this
> resource.
> 9522195: system.cpu.fetch_seq_unit: [tid:0] Setting up squash to start from
> stage 0, after [sn:0].
> 9522195: system.cpu.fetch_seq_unit[slot:1]:: done with request from
> [sn:1002] [tid:0].
> 9522195: system.cpu.stage0: [tid:0]: [sn:1002] request to
> system.cpu.fetch_seq_unit completed.
> 9522195: system.cpu.fetch_seq_unit: Deallocating [slot:1].
> 9522195: system.cpu.stage0: [tid:0]: Attempting to send instructions to
> stage 1.
> 9522195: system.cpu.stage0: [tid:0] 1 slots available in next stage buffer.
> 9522195: system.cpu.stage0: [tid:0]: [sn:1002]: being placed into  index 0
> of stage buffer 0 queue.
> 9522195: system.cpu.stage0: Done Processing [tid:0]
> 9522195: system.cpu.stage0:
>
> Cycle n+1:
> 9522612: system.cpu.fetch_seq_unit: Allocating [slot:0] for [tid:0]:
> [sn:1003]
> 9522612: system.cpu.fetch_seq_unit: [tid:0]: instruction requesting this
> resource.
> 9522612: system.cpu.fetch_seq_unit: [tid:0]: Current PC is
> (0x120008238=>0x120008220)
> 9522612: system.cpu.fetch_seq_unit: [tid:0]: Assigning [sn:1003] to PC
> (0x120008238=>0x120008220)
> 9522612: system.cpu.fetch_seq_unit[slot:0]:: done with request from
> [sn:1003] [tid:0].
>
> I cleaned a lot of unnecessary messeges for clarity. But the Idea is that
> the branch command does not update the fetch correctly. The predictor has
> problems predicting the second command claiming that the BTB missing an
> entry for this command and so the branch is predicted as not taken
> (reminder: this is the second consecutive appearance of this branch).
> Later on stage 10 (out of 12 in my pipe) the second branch is being flushed
> because of miss prediction. And it's happens all the time causing my CPI to
> be 4 times it's calculated value.
>
> I located the problem in the bpred_unit.cc file. And after I receive a
> prediction from the predictor I advance it making the fetch unit not
> fetching the second branch. Now all is well.
>
> I attach my change (It's towards the end):
>
> bool
> BPredUnit::predict(DynInstPtr &inst, TheISA::PCState &predPC, ThreadID tid)
> {
>     // See if branch predictor predicts taken.
>     // If so, get its target addr either from the BTB or the RAS.
>     // Save off record of branch stuff so the RAS can be fixed
>     // up once it's done.
>
>     using TheISA::MachInst;
>
>     int asid = inst->asid;
>     bool pred_taken = false;
>     TheISA::PCState target;
>
>     ++lookups;
>     DPRINTF(InOrderBPred, "[tid:%i] [sn:%i] %s ... PC %s doing branch "
>             "prediction\n", tid, inst->seqNum,
>             inst->staticInst->disassemble(inst->instAddr()),
>             inst->pcState());
>
>
>     void *bp_history = NULL;
>
>     if (inst->isUncondCtrl()) {
>         DPRINTF(InOrderBPred, "[tid:%i] Unconditional control.\n",
>                 tid);
>         pred_taken = true;
>         // Tell the BP there was an unconditional branch.
>         BPUncond(bp_history);
>
>         if (inst->isReturn() && RAS[tid].empty()) {
>             DPRINTF(InOrderBPred, "[tid:%i] RAS is empty, predicting "
>                     "false.\n", tid);
>             pred_taken = false;
>         }
>     } else {
>         ++condPredicted;
>
>         pred_taken = BPLookup(predPC.instAddr(), bp_history);
>     }
>
>     PredictorHistory predict_record(inst->seqNum, predPC, pred_taken,
>                                     bp_history, tid);
>
>     // Now lookup in the BTB or RAS.
>     if (pred_taken) {
>         if (inst->isReturn()) {
>             ++usedRAS;
>
>             // If it's a function return call, then look up the address
>             // in the RAS.
>             TheISA::PCState rasTop = RAS[tid].top();
>             target = TheISA::buildRetPC(inst->pcState(), rasTop);
>
>             // Record the top entry of the RAS, and its index.
>             predict_record.usedRAS = true;
>             predict_record.RASIndex = RAS[tid].topIdx();
>             predict_record.rasTarget = rasTop;
>
>             assert(predict_record.RASIndex < 16);
>
>             RAS[tid].pop();
>
>             DPRINTF(InOrderBPred, "[tid:%i]: Instruction %s is a return, "
>                     "RAS predicted target: %s, RAS index: %i.\n",
>                     tid, inst->pcState(), target,
>                     predict_record.RASIndex);
>         } else {
>             ++BTBLookups;
>
>             if (inst->isCall()) {
>
>                 RAS[tid].push(inst->pcState());
>
>                 // Record that it was a call so that the top RAS entry can
>                 // be popped off if the speculation is incorrect.
>                 predict_record.wasCall = true;
>
>                 DPRINTF(InOrderBPred, "[tid:%i]: Instruction %s was a call"
>                         ", adding %s to the RAS index: %i.\n",
>                         tid, inst->pcState(), predPC,
>                         RAS[tid].topIdx());
>             }
>
>             if (inst->isCall() &&
>                 inst->isUncondCtrl() &&
>                 inst->isDirectCtrl()) {
>                 target = inst->branchTarget();
>             } else if (BTB.valid(predPC.instAddr(), asid)) {
>                 ++BTBHits;
>
>                 // If it's not a return, use the BTB to get the target addr.
>                 target = BTB.lookup(predPC.instAddr(), asid);
>
>                 DPRINTF(InOrderBPred, "[tid:%i]: [asid:%i] Instruction %s "
>                         "predicted target is %s.\n",
>                         tid, asid, inst->pcState(), target);
>             } else {
>                 DPRINTF(InOrderBPred, "[tid:%i]: BTB doesn't have a "
>                         "valid entry, predicting false.\n",tid);
>                 pred_taken = false;
>             }
>         }
>     }
>
>     if (pred_taken) {
>         // Set the PC and the instruction's predicted target.
>         predPC = target;
>                 TheISA::advancePC(predPC, inst->staticInst);  ß
>     }
>     DPRINTF(InOrderBPred, "[tid:%i]: [sn:%i]: Setting Predicted PC to
> %s.\n",
>             tid, inst->seqNum, predPC);
>
>     predHist[tid].push_front(predict_record);
>
>     DPRINTF(InOrderBPred, "[tid:%i] [sn:%i] pushed onto front of predHist "
>             "...predHist.size(): %i\n",
>             tid, inst->seqNum, predHist[tid].size());
>
>     return pred_taken;
> }
>
> Does my story seems right and is my correction is in place?
>
> Thanks,
> Yuval.
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>
>
> _______________________________________________
> gem5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users



-- 
- Korey
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to