On Mon, Jun 29, 2009 at 07:44:26PM -0400, Gabriel Michael Black wrote:
> These comments apply to all the above. While I believe you that the
> manual says the behavior is undefined if you write to R15 in these
> other cases, that doesn't mean that software people will want to run
> will never attempt to do that and expect certain behavior. It could be
> that the software is old or poorly written (or its the compilers
> fault), but if at all possible I'd like to support those cases as much
> as we reasonably can. If certain cases turn out to be too unreasonable
> to deal with for some definition of reasonable, we can just let those
> go until and if they cause problems. I believe we'll find a general
> mechanism that will handle those cases almost as easily as the defined
> ones.
This is true, but it must be said that the ARM assembler won't generate
those instructions (it's an error condition) and even the GNU assembler
produces a warning for (some) of them. So, do not worry if they can't
easily be supported, any program with these instructions is very broken.
> As far as behaving differently for regular instructions (add, etc.)
> that write to R15, we can detect that happening in the decoder and set
> the right flag right there. That wouldn't be that hard to do, and if
> we can keep isa specific code out of the CPU that would be best.
This is a good place to do it.
> The right thing to do here might be to microcode loads that we know
> are going to act as branches. The first microop would load, and the
> second would actually perform the branch. That would require a little
> work in the C++ side of the isa description, but it wouldn't be -that-
> painful. That would hopefully avoid putting any new ISA specific code
> in the CPU.
Sounds reasonable. I thought about doing it this way, but wasn't sure
how to do it without major changes in decoder.isa.
However - I've hit a related issue, that could also be solved by a large
change to decoder.isa, but probably has a much better solution.
Consider a chain of conditional instructions:
cmp r0, #1
cmpne r0, #2
cmpne r0, #3
Here, instructions 2 and 3 depend on their immediate predecessors. Each
instruction generates a set of flags (Cpsr) which determines if the next
instruction will execute or not. This limits the ILP available - O3
cannot reorder or parallelise these instructions. Each uses input from
the previous instruction.
The same situation incorrectly occurs when the instructions are *not*
conditional, e.g.
cmp r0, #1
cmp r0, #2
cmp r0, #3
Even though the instructions are always executed, instructions 2 and 3
*still* depend on their immediate predecessors because Cpsr is still
an input. The functionality is correct, but inefficient. Really, it
would seem best to flag these instructions as unconditional during decode,
so that the Cpsr input could be ignored. That would permit O3 to reorder
the instructions.
Any suggestions on how to do this?
--
Jack Whitham
[email protected]
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev