Quoting Jack Whitham <[email protected]>: > On Wed, Jun 24, 2009 at 09:30:11PM -0400, Gabriel Michael Black wrote: >> > Also, is it the case that you can write the PC with any opcode? I vaguely >> > recall hearing that that was deprecated at some point. >> > >> >> I don't know. I saw some places in the ARM manual where it said some >> uses of R15 cause undefined behavior. I don't know where to find a >> concise description of where it is and isn't allowed. > > I've also been looking at this problem. I want to get the O3 CPU working with > the ARM ISA, and I've been making some progress. The two sticking points are > predicated execution and the PC's "pseudo-GPR" nature. So, hopefully I > can contribute something to your discussion. > > As I see it, the big problem is when the PC is a destination for > an operation that normally produces a GPR. As a source, you > can always get the right behaviour quite easily (as you said, + 8): >> >> Example reading code: >> >> Rn = (RN == PCReg) ? xc->readPC() : xc->readIntRegOperand(this, 0); > > I've gone through the ARM ARM, looked at all the integer instructions, > and the writes to PC *always* fall into one of the following categories: > 1. Performed via an explicit branch instruction (b, blx, etc.) > 2. Performed by load multiple (e.g. ldmia). > 3. Performed by an ALU or load operation that writes to Rd, with Rd = > 15. (Rd is machine code bits 15..12 as defined in operands.isa). > > Case 1 is easy, same as any other CPU. Case 2 is only slightly more > difficult, because you can generate a special "write to PC" uop > resembling a conditional branch. > > Case 3 is the difficulty. But it's not so bad, because- > * The PC is *always* written via Rd. Not Rs, Rm, etc. > * No other registers are updated (except Cpsr). > > Here are the instructions that fall into case 3: > ADC ADD AND BIC > CLZ EOR LDR LDRB > LDRBT LDRH LDRSB LDRSH > LDRT MOV MVN ORR > RSB RSC SBC SUB > > It is possible to specify r15 as the output of some other instructions > (e.g. MUL, MLA), but the results are not defined by the ISA. > > Clearly, there are a number of ways to handle this. Any of these > instructions could write the PC, creating a branch. > > Here is my suggestion for how to handle case 3 instructions. I am > relatively new to M5, so this may not be the best way, but some > preliminary tests suggest it will work. First, the control flags need > to change if the instruction writes to r15: > > // base_dyn_inst.hh: > bool isControl() const > { return staticInst->isControl() || isPCLoad(); } > bool isIndirectCtrl() const > { return staticInst->isIndirectCtrl() || isPCLoad(); } > bool isCondCtrl() const > { return staticInst->isCondCtrl() || isPCLoad(); } > > The new function isPCLoad() returns true if the ISA is ARM > and one of the destination registers is 15 (before renaming): > > bool isPCLoad(void) const > { > #if THE_ISA == ARM_ISA > for (int i = 0; i < numDestRegs(); i++) { > if (staticInst->destRegIdx(i) == TheISA::PCReg) { > return true; > } > } > #endif > return false; > } > > This way, we don't have to change the ISA definitions for case 3 > instructions. They automatically become branches if rD = 15.
These comments apply to all the above. While I believe you that the manual says the behavior is undefined if you write to R15 in these other cases, that doesn't mean that software people will want to run will never attempt to do that and expect certain behavior. It could be that the software is old or poorly written (or its the compilers fault), but if at all possible I'd like to support those cases as much as we reasonably can. If certain cases turn out to be too unreasonable to deal with for some definition of reasonable, we can just let those go until and if they cause problems. I believe we'll find a general mechanism that will handle those cases almost as easily as the defined ones. As far as behaving differently for regular instructions (add, etc.) that write to R15, we can detect that happening in the decoder and set the right flag right there. That wouldn't be that hard to do, and if we can keep isa specific code out of the CPU that would be best. > > However, this still leaves the problem of operations that load the new > PC from memory, such as: > ldreq pc, [sp], #4 > > On O3, these are tricky because the new PC is not known until the > load completes (LSQUnit::writeback), but branch mispredictions are only > recognised when the load starts (LSQUnit::executeLoad). My suggestion > for these is to generate a branch misprediction in LSQUnit::writeback, > if (inst->isPCLoad()&&inst->mispredicted), and don't generate a branch > misprediction in the regular place if inst->isPCLoad(). The right thing to do here might be to microcode loads that we know are going to act as branches. The first microop would load, and the second would actually perform the branch. That would require a little work in the C++ side of the isa description, but it wouldn't be -that- painful. That would hopefully avoid putting any new ISA specific code in the CPU. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
