Hi everybody. I've been doing a bunch of work on our ARM support, and I've run into a few issues unique (or at least especially present) to that ISA. I have solutions that I've partially implemented, but I wanted to talk about them here so everyone will have a chance to comment.
1. Many (most?) instructions have more than one encoding, some for 16 bit wide thumb, some for 32 bit wide thumb, and some for 32 bit ARM proper. These encodings can be substantially different both in where the relevant fields are, and in how the instruction itself is identified. To handle this situation, I borrowed an approach I used with x86 and separated the actual decoder specification from the instruction definitions. The instruction classes themselves are set up in let blocks, conceptually very similarly to how they would be set up with formats. The formats then just reference the right classes. To help deal with the variability in how operands are specified, the instructions get their register indices and immediates directly from their constructor arguments rather than relying on their internal copy of the machInst. I have most loads and stores reimplemented this way, and it's working out well so far. 2. ARM's decoding structure is very incompatible with the switch-on-a-bitfield sort of decoding the isa_parser supports. As a result, the decoder gets really complicated, and it's hard to tell looking at the manual and the decoder specification what matches with what. That not only makes it hard to fix problems and easy to introduce bugs, it also probably results in a slow decoder from all the unnecessary nesting of switch statements. The problem basically stems from the fact that the encodings may select instructions based on a 1 being in bit 3 of some field, and if not then bit 1 and 2 being zero, and if so if the destination register is the PC, and if not if it's Tuesday, etc. Relatedly, the bits that are used may not be contiguous. There is no way to mask out certain bits in the value to switch on without playing games with the ExtMachInst type and maybe also the predecoder, further obfuscating things and making life difficult. Also, a sort of subproblem here is that the bitfields are not generally mnemonically named. There are many, many fields name "op1" or "opa" and they move around depending on what diagram your looking at. Defining all those bitfields is cumbersome and messy. What I've done to address this problem is to pull the actual meat of the decoder out of the decoder construct and into the formats themselves. The decode_block of a particular format is built up in python to decode to the right classes (mentioned in 1), and the format is only used in one place to bring in that code. Eventually I imagine most of the decoder being handled this way. I've implemented this with the loads and stores I mentioned above, and it's also worked out well so far. To handle the bitfield problem, I just define some temporaries at the top of the decode_block called op1, etc. that are scoped just to that block. Nothing else needs to worry about what bits hold op1 for the loads of a single data item in the 16 bit arm encodings with 8 bit immediates. 3. ARM has modes, and these can in many cases be transitioned between by regular branches. Even arithmetic instructions can be that kind of branch, so there are many instructions that can affect the way following instructions are decoded. It's not practical to hold up decode to see what happens in all those cases. To handle this I commandeered a few bits of the 64 bit PC (ARM's PC is 32 bits) to represent the mode. If the mode is switched or not switched incorrectly, it looks like a branch mispredict and is handled with the usual mechanisms. The predecoder takes the right bits of the PC and uses them to interpret the incoming instruction bytes correctly, and puts those bits into the ExtMachInst so the decoder can contextualize how to decode things. While I have this implemented and working as far as I can tell, there are two (at least) problems with it. First, the instruction tracer doesn't understand that a few bits of the PC are artificial, and that breaks where it tries to print the PC as a symbol plus an offset. Second, there's nothing special about those bits in the PC. It may be possible through some fluke to change them unintentionally through a branch with a weird target, etc., and change the mode. That's unlikely and the simulated code would likely have to try hard to break things, but I'd like for it not to even be possible. A more general solution to this problem would be to introduce some sort of speculative decode state mechanism. That would work basically the same as this does, but without piggybacking on the PC and branches. That would also be potentially useful for SPARC and its register windows. I was thinking about (and may have mentioned) trying to combine all the PCs into one structure so there wouldn't have to be so many function calls to update them. Perhaps that would be a good place for this state? That would also potentially hide whether an ISA has branch delay slots, and let, say, MIPS define a function to increment the PC without worrying about the microcode PCs. I didn't include any patches with this email because I'm still in the process of implementing these changes, and I didn't want to send out a lot of patches that would be out of date right away. I'm guessing there's still plenty to talk about at the conceptual level. Please let me know what you guys think. Gabe _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
