> On 2011-01-27 00:27:45, Gabe Black wrote: > > Part of the idea of the PCState stuff is that this shouldn't be necessary. > > When I converted InOrder I fudged things a little since fetch wasn't > > structured in a way that made things drop into place and I wasn't sure > > exactly how everything worked, but the details of branch delay slots should > > be totally hidden. From my commit message: > > > > One drawback of making the StaticInsts advance the PC is that you have > > to > > actually have one to advance the PC. This would, superficially, seem to > > require decoding an instruction before fetch could advance. This is, as > > far as > > I can tell, realistic. fetch would advance through memory addresses, > > not PCs, > > perhaps predicting new memory addresses using existing ones. More > > sophisticated decisions about control flow would be made later on, > > after the > > instruction was decoded, and handed back to fetch. If branching needs to > > happen, some amount of decoding needs to happen to see that it's a > > branch, > > what the target is, etc. This could get a little more complicated if > > that gets > > done by the predecoder, but I'm choosing to ignore that for now. > > > > I imagine fetch moving through memory addresses like I said, and then > > perhaps decode (or fetch, if it's decoding the instruction) advancing a > > corresponding PC object that works alongside the StaticInsts being set up. > > Decode would grab the right bytes out of the fetch buffer depending on the > > actual value of instAddr(). The fact that branch delay slots don't just go > > to the next instruction if the delay slot is annulled should be predicted > > by the branch predictor because it associates a PCState with the next > > PCState when branching. > > > > It gets a little confusing and I haven't looked at that stuff for a few > > months, but it worked out for O3. > > Korey Sewell wrote: > I wish I had the exact error trace that I was debugging in front of me, > but I'll give it a go on understanding on then relook at how it affects this > patch. > > Are we saying the branch predictor should be responsible for updating the > NNPC (not taken or taken in the PCState?) > > I believe what I was observing is that the PCState before branch > prediction is: > PC: Addr > NPC: Addr+4 > NNPC: Addr+4 (?, something that wasnt Addr+8) > > But then when an instruction ended up being predicted not-taken, the NNPC > wasnt being updated correctly so we break there. This is because the > predicted PC is set to default be the not taken path achieved by advancing > once on PCState but that doesnt work for the default value when it should > advance 2x. > > I'll try to mark in the review , the problem case. > > (Lastly, was your (Gabe) goal to remove all #ISA_HAS_DELAY_SLOT in the > code?)
You should only be predicting entire PCState objects, not individual components. If the PC doesn't change in the way you expect, then you have a mispredict and you go back and fix things up. The predictor will learn what the entire new PCState looks like after a branch as if it was all one big possibly complex branch target, so it's pc, npc, nnpc, etc will all do what it's supposed to do, possibly after getting it wrong once. The idea was to get rid of the ISA_HAS_DELAY_SLOT type code, also including all the special handling for all the different components PCs might have and the rules for updating them (branch delay slots, variable length instructions, mode changing, microcode). That stuff was getting really unwieldy because it was trying to do everything with the same blob of code, and it had to be reimplemented for each CPU which made it even more error prone and a lot more work to get going. Now, ideally, x86 or ARM's PC semantics would just drop into InOrder and do what they're supposed to do without having to jump in and do a bunch of remodeling. Nothing's perfect, of course, so there might be some issues, but it seemed to go pretty smoothly for O3 and cleared up some fairly serious otherwise undiscovered performance bugs. - Gabe ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.m5sim.org/r/449/#review806 ----------------------------------------------------------- On 2011-01-25 16:04:38, Korey Sewell wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.m5sim.org/r/449/ > ----------------------------------------------------------- > > (Updated 2011-01-25 16:04:38) > > > Review request for Default, Ali Saidi, Gabe Black, Steve Reinhardt, and > Nathan Binkert. > > > Summary > ------- > > inorder: pcstate and delay slots bug > For ISAs with delay slots, not taken branches were not being advanced > correctly to pc+8, so for those ISAs > we 'advance()' the pcstate one more time for the desired effect > > > Diffs > ----- > > src/cpu/inorder/resources/branch_predictor.cc 31a04e5ac4be > src/cpu/inorder/resources/fetch_seq_unit.cc 31a04e5ac4be > > Diff: http://reviews.m5sim.org/r/449/diff > > > Testing > ------- > > > Thanks, > > Korey > >
_______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev