Does anybody have any comments? I'd like to address any issues sooner
rather than later.

Gabe

Gabe Black wrote:
>     Hi everybody. I've been doing a bunch of work on our ARM support,
> and I've run into a few issues unique (or at least especially present)
> to that ISA. I have solutions that I've partially implemented, but I
> wanted to talk about them here so everyone will have a chance to comment.
>
> 1. Many (most?) instructions have more than one encoding, some for 16
> bit wide thumb, some for 32 bit wide thumb, and some for 32 bit ARM
> proper. These encodings can be substantially different both in where the
> relevant fields are, and in how the instruction itself is identified. To
> handle this situation, I borrowed an approach I used with x86 and
> separated the actual decoder specification from the instruction
> definitions. The instruction classes themselves are set up in let
> blocks, conceptually very similarly to how they would be set up with
> formats. The formats then just reference the right classes. To help deal
> with the variability in how operands are specified, the instructions get
> their register indices and immediates directly from their constructor
> arguments rather than relying on their internal copy of the machInst. I
> have most loads and stores reimplemented this way, and it's working out
> well so far.
>
> 2. ARM's decoding structure is very incompatible with the
> switch-on-a-bitfield sort of decoding the isa_parser supports. As a
> result, the decoder gets really complicated, and it's hard to tell
> looking at the manual and the decoder specification what matches with
> what. That not only makes it hard to fix problems and easy to introduce
> bugs, it also probably results in a slow decoder from all the
> unnecessary nesting of switch statements. The problem basically stems
> from the fact that the encodings may select instructions based on a 1
> being in bit 3 of some field, and if not then bit 1 and 2 being zero,
> and if so if the destination register is the PC, and if not if it's
> Tuesday, etc. Relatedly, the bits that are used may not be contiguous.
> There is no way to mask out certain bits in the value to switch on
> without playing games with the ExtMachInst type and maybe also the
> predecoder, further obfuscating things and making life difficult.
>
> Also, a sort of subproblem here is that the bitfields are not generally
> mnemonically named. There are many, many fields name "op1" or "opa" and
> they move around depending on what diagram your looking at. Defining all
> those bitfields is cumbersome and messy.
>
> What I've done to address this problem is to pull the actual meat of the
> decoder out of the decoder construct and into the formats themselves.
> The decode_block of a particular format is built up in python to decode
> to the right classes (mentioned in 1), and the format is only used in
> one place to bring in that code. Eventually I imagine most of the
> decoder being handled this way. I've implemented this with the loads and
> stores I mentioned above, and it's also worked out well so far. To
> handle the bitfield problem, I just define some temporaries at the top
> of the decode_block called op1, etc. that are scoped just to that block.
> Nothing else needs to worry about what bits hold op1 for the loads of a
> single data item in the 16 bit arm encodings with 8 bit immediates.
>
> 3. ARM has modes, and these can in many cases be transitioned between by
> regular branches. Even arithmetic instructions can be that kind of
> branch, so there are many instructions that can affect the way following
> instructions are decoded. It's not practical to hold up decode to see
> what happens in all those cases. To handle this I commandeered a few
> bits of the 64 bit PC (ARM's PC is 32 bits) to represent the mode. If
> the mode is switched or not switched incorrectly, it looks like a branch
> mispredict and is handled with the usual mechanisms. The predecoder
> takes the right bits of the PC and uses them to interpret the incoming
> instruction bytes correctly, and puts those bits into the ExtMachInst so
> the decoder can contextualize how to decode things. While I have this
> implemented and working as far as I can tell, there are two (at least)
> problems with it.
>
> First, the instruction tracer doesn't understand that a few bits of the
> PC are artificial, and that breaks where it tries to print the PC as a
> symbol plus an offset. Second, there's nothing special about those bits
> in the PC. It may be possible through some fluke to change them
> unintentionally through a branch with a weird target, etc., and change
> the mode. That's unlikely and the simulated code would likely have to
> try hard to break things, but I'd like for it not to even be possible.
>
> A more general solution to this problem would be to introduce some sort
> of speculative decode state mechanism. That would work basically the
> same as this does, but without piggybacking on the PC and branches. That
> would also be potentially useful for SPARC and its register windows. I
> was thinking about (and may have mentioned) trying to combine all the
> PCs into one structure so there wouldn't have to be so many function
> calls to update them. Perhaps that would be a good place for this state?
> That would also potentially hide whether an ISA has branch delay slots,
> and let, say, MIPS define a function to increment the PC without
> worrying about the microcode PCs.
>
>
> I didn't include any patches with this email because I'm still in the
> process of implementing these changes, and I didn't want to send out a
> lot of patches that would be out of date right away. I'm guessing
> there's still plenty to talk about at the conceptual level. Please let
> me know what you guys think.
>
> Gabe
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev
>   

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to