Does anybody have any comments? I'd like to address any issues sooner rather than later.
Gabe Gabe Black wrote: > Hi everybody. I've been doing a bunch of work on our ARM support, > and I've run into a few issues unique (or at least especially present) > to that ISA. I have solutions that I've partially implemented, but I > wanted to talk about them here so everyone will have a chance to comment. > > 1. Many (most?) instructions have more than one encoding, some for 16 > bit wide thumb, some for 32 bit wide thumb, and some for 32 bit ARM > proper. These encodings can be substantially different both in where the > relevant fields are, and in how the instruction itself is identified. To > handle this situation, I borrowed an approach I used with x86 and > separated the actual decoder specification from the instruction > definitions. The instruction classes themselves are set up in let > blocks, conceptually very similarly to how they would be set up with > formats. The formats then just reference the right classes. To help deal > with the variability in how operands are specified, the instructions get > their register indices and immediates directly from their constructor > arguments rather than relying on their internal copy of the machInst. I > have most loads and stores reimplemented this way, and it's working out > well so far. > > 2. ARM's decoding structure is very incompatible with the > switch-on-a-bitfield sort of decoding the isa_parser supports. As a > result, the decoder gets really complicated, and it's hard to tell > looking at the manual and the decoder specification what matches with > what. That not only makes it hard to fix problems and easy to introduce > bugs, it also probably results in a slow decoder from all the > unnecessary nesting of switch statements. The problem basically stems > from the fact that the encodings may select instructions based on a 1 > being in bit 3 of some field, and if not then bit 1 and 2 being zero, > and if so if the destination register is the PC, and if not if it's > Tuesday, etc. Relatedly, the bits that are used may not be contiguous. > There is no way to mask out certain bits in the value to switch on > without playing games with the ExtMachInst type and maybe also the > predecoder, further obfuscating things and making life difficult. > > Also, a sort of subproblem here is that the bitfields are not generally > mnemonically named. There are many, many fields name "op1" or "opa" and > they move around depending on what diagram your looking at. Defining all > those bitfields is cumbersome and messy. > > What I've done to address this problem is to pull the actual meat of the > decoder out of the decoder construct and into the formats themselves. > The decode_block of a particular format is built up in python to decode > to the right classes (mentioned in 1), and the format is only used in > one place to bring in that code. Eventually I imagine most of the > decoder being handled this way. I've implemented this with the loads and > stores I mentioned above, and it's also worked out well so far. To > handle the bitfield problem, I just define some temporaries at the top > of the decode_block called op1, etc. that are scoped just to that block. > Nothing else needs to worry about what bits hold op1 for the loads of a > single data item in the 16 bit arm encodings with 8 bit immediates. > > 3. ARM has modes, and these can in many cases be transitioned between by > regular branches. Even arithmetic instructions can be that kind of > branch, so there are many instructions that can affect the way following > instructions are decoded. It's not practical to hold up decode to see > what happens in all those cases. To handle this I commandeered a few > bits of the 64 bit PC (ARM's PC is 32 bits) to represent the mode. If > the mode is switched or not switched incorrectly, it looks like a branch > mispredict and is handled with the usual mechanisms. The predecoder > takes the right bits of the PC and uses them to interpret the incoming > instruction bytes correctly, and puts those bits into the ExtMachInst so > the decoder can contextualize how to decode things. While I have this > implemented and working as far as I can tell, there are two (at least) > problems with it. > > First, the instruction tracer doesn't understand that a few bits of the > PC are artificial, and that breaks where it tries to print the PC as a > symbol plus an offset. Second, there's nothing special about those bits > in the PC. It may be possible through some fluke to change them > unintentionally through a branch with a weird target, etc., and change > the mode. That's unlikely and the simulated code would likely have to > try hard to break things, but I'd like for it not to even be possible. > > A more general solution to this problem would be to introduce some sort > of speculative decode state mechanism. That would work basically the > same as this does, but without piggybacking on the PC and branches. That > would also be potentially useful for SPARC and its register windows. I > was thinking about (and may have mentioned) trying to combine all the > PCs into one structure so there wouldn't have to be so many function > calls to update them. Perhaps that would be a good place for this state? > That would also potentially hide whether an ISA has branch delay slots, > and let, say, MIPS define a function to increment the PC without > worrying about the microcode PCs. > > > I didn't include any patches with this email because I'm still in the > process of implementing these changes, and I didn't want to send out a > lot of patches that would be out of date right away. I'm guessing > there's still plenty to talk about at the conceptual level. Please let > me know what you guys think. > > Gabe > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev > _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
