Hi everybody. I've been doing a bunch of work on our ARM support,
and I've run into a few issues unique (or at least especially present)
to that ISA. I have solutions that I've partially implemented, but I
wanted to talk about them here so everyone will have a chance to comment.

1. Many (most?) instructions have more than one encoding, some for 16
bit wide thumb, some for 32 bit wide thumb, and some for 32 bit ARM
proper. These encodings can be substantially different both in where the
relevant fields are, and in how the instruction itself is identified. To
handle this situation, I borrowed an approach I used with x86 and
separated the actual decoder specification from the instruction
definitions. The instruction classes themselves are set up in let
blocks, conceptually very similarly to how they would be set up with
formats. The formats then just reference the right classes. To help deal
with the variability in how operands are specified, the instructions get
their register indices and immediates directly from their constructor
arguments rather than relying on their internal copy of the machInst. I
have most loads and stores reimplemented this way, and it's working out
well so far.

2. ARM's decoding structure is very incompatible with the
switch-on-a-bitfield sort of decoding the isa_parser supports. As a
result, the decoder gets really complicated, and it's hard to tell
looking at the manual and the decoder specification what matches with
what. That not only makes it hard to fix problems and easy to introduce
bugs, it also probably results in a slow decoder from all the
unnecessary nesting of switch statements. The problem basically stems
from the fact that the encodings may select instructions based on a 1
being in bit 3 of some field, and if not then bit 1 and 2 being zero,
and if so if the destination register is the PC, and if not if it's
Tuesday, etc. Relatedly, the bits that are used may not be contiguous.
There is no way to mask out certain bits in the value to switch on
without playing games with the ExtMachInst type and maybe also the
predecoder, further obfuscating things and making life difficult.

Also, a sort of subproblem here is that the bitfields are not generally
mnemonically named. There are many, many fields name "op1" or "opa" and
they move around depending on what diagram your looking at. Defining all
those bitfields is cumbersome and messy.

What I've done to address this problem is to pull the actual meat of the
decoder out of the decoder construct and into the formats themselves.
The decode_block of a particular format is built up in python to decode
to the right classes (mentioned in 1), and the format is only used in
one place to bring in that code. Eventually I imagine most of the
decoder being handled this way. I've implemented this with the loads and
stores I mentioned above, and it's also worked out well so far. To
handle the bitfield problem, I just define some temporaries at the top
of the decode_block called op1, etc. that are scoped just to that block.
Nothing else needs to worry about what bits hold op1 for the loads of a
single data item in the 16 bit arm encodings with 8 bit immediates.

3. ARM has modes, and these can in many cases be transitioned between by
regular branches. Even arithmetic instructions can be that kind of
branch, so there are many instructions that can affect the way following
instructions are decoded. It's not practical to hold up decode to see
what happens in all those cases. To handle this I commandeered a few
bits of the 64 bit PC (ARM's PC is 32 bits) to represent the mode. If
the mode is switched or not switched incorrectly, it looks like a branch
mispredict and is handled with the usual mechanisms. The predecoder
takes the right bits of the PC and uses them to interpret the incoming
instruction bytes correctly, and puts those bits into the ExtMachInst so
the decoder can contextualize how to decode things. While I have this
implemented and working as far as I can tell, there are two (at least)
problems with it.

First, the instruction tracer doesn't understand that a few bits of the
PC are artificial, and that breaks where it tries to print the PC as a
symbol plus an offset. Second, there's nothing special about those bits
in the PC. It may be possible through some fluke to change them
unintentionally through a branch with a weird target, etc., and change
the mode. That's unlikely and the simulated code would likely have to
try hard to break things, but I'd like for it not to even be possible.

A more general solution to this problem would be to introduce some sort
of speculative decode state mechanism. That would work basically the
same as this does, but without piggybacking on the PC and branches. That
would also be potentially useful for SPARC and its register windows. I
was thinking about (and may have mentioned) trying to combine all the
PCs into one structure so there wouldn't have to be so many function
calls to update them. Perhaps that would be a good place for this state?
That would also potentially hide whether an ISA has branch delay slots,
and let, say, MIPS define a function to increment the PC without
worrying about the microcode PCs.


I didn't include any patches with this email because I'm still in the
process of implementing these changes, and I didn't want to send out a
lot of patches that would be out of date right away. I'm guessing
there's still plenty to talk about at the conceptual level. Please let
me know what you guys think.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to