So, basically what's happening is that the pextrb instruction has a 3 byte
opcode. This is written about in more detail somewhere, but basically as
bytes flow out of memory and are decoded, they go through two main steps.

First, they go through the predecoder which turns them from a stream of
bytes (of undetermined length) into a standard representation (ExtMachInst)
that also has all the context the instruction was decoded in that would
affect what those bytes would mean.

Second, that object is sent through a decoder which actually maps that into
a StaticInst class which has properties that describe the instruction, a
function to actually do what the instruction does, etc. That mapping is
cached, so there must be a consistent, one to one mapping of ExtMachInsts
and StaticInsts.

The predecoder (step one) was written with a nod to three byte opcodes, but
it doesn't really support them. For instance, it uses tables
in src/arch/x86/decoder_tables.cc to figure out various things about an
instruction, and there are only two versions of those tables, one for one
byte opcodes and one for two byte opcodes. Three byte opcodes will, I
believe, index outside of the bounds of the array and read something
arbitrary. Because those tables determine some of what's in an instruction
and how big things like immediates are, the size of the pextrb instruction
is being miscalculated and it then tries to execute in the middle of an
instruction.

To properly support three byte opcodes (or even four byte opcodes!) the
tables would need to be expanded to have information about them.
Unfortunately, that also introduces a complication. Within a certain,
fairly robust approximation, the relevant properties of one and two byte
instructions can be determined just based on how many bytes they have and
what the actual live opcode is at the end, ignoring escape bytes. There are
a couple escape bytes for the three byte opcodes though. Somebody would
have to verify that all three byte opcodes with a certain live byte,
regardless of prefix, have the same properties, or we'd have to modify how
the predecoder works to take that into account.

One thing to keep in mind is that the predecoder processes every single
byte decoded for execution with no caching. Making it slower even by a
small amount could have a meaningful impact on over all performance.

Gabe

On Mon, Nov 10, 2014 at 3:16 PM, Ahmed Khawaja via gem5-dev <
[email protected]> wrote:

> Greetings,
>
>             I was able to successfully reproduce the error using a stand
> alone micro-program (which I have attached) and the source code is listed
> below. The program runs fine (doesn't do anything really) stand alone, but
> crashes in gem5 with the same signature of a random ADD instruction
> appearing after the PEXTRB instruction. Two things to note, in the
> committed version of GEM5 PEXTRB is not implemented but it SHOULD still be
> decoded properly. The program works fine if I change the PEXTRB instruction
> to reference a register instead of memory, it appears the bug becomes an
> issue when the destination is a memory operand. The instruction listed
> below is a copy of what was in my original program. Furthermore, I tried
> running it the with immediate byte != 0 and the program did NOT crash in
> GEM5. Any help would be greatly appreciated!
>
>
> // Trace Error
>
> 5269000: system.cpu + A0 T0 : @main+4    : push rax
> 5269000: system.cpu + A0 T0 : @main+4.0  :   PUSH_R : st   rax, SS:[rsp +
> 0xfffffffffffffff8] : MemWrite :  D=0x000000000040105e A=0x7fffffffed68
>
>  flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsDelayedCommit|IsFirstMicroop)
> 5269500: system.cpu + A0 T0 : @main+4.1  :   PUSH_R : subi   rsp, rsp, 0x8
> : IntAlu :  D=0x00007fffffffed68  flags=(IsInteger|IsMicroop|IsLastMicroop)
> 5270000: system.cpu + A0 T0 : @main+5    : push rdx
> 5270000: system.cpu + A0 T0 : @main+5.0  :   PUSH_R : st   rdx, SS:[rsp +
> 0xfffffffffffffff8] : MemWrite :  D=0x00007fffffffee68 A=0x7fffffffed60
>
>  flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsDelayedCommit|IsFirstMicroop)
> 5270500: system.cpu + A0 T0 : @main+5.1  :   PUSH_R : subi   rsp, rsp, 0x8
> : IntAlu :  D=0x00007fffffffed60  flags=(IsInteger|IsMicroop|IsLastMicroop)
> 5271500: system.cpu + A0 T0 : @main+6    : mov rdi, 0
> 5271500: system.cpu + A0 T0 : @main+6.0  :   MOV_R_I : limm   rdx, 0  :
> IntAlu :  D=0x0000000000000000
>  flags=(IsInteger|IsMicroop|IsLastMicroop|IsFirstMicroop)
> 5272000: system.cpu + A0 T0 : @main+13    : mov rax, rsp
> 5272000: system.cpu + A0 T0 : @main+13.0  :   MOV_R_R : mov   rax, rax, rsp
> : IntAlu :  D=0x00007fffffffed60
>  flags=(IsInteger|IsMicroop|IsLastMicroop|IsFirstMicroop)
> 5273000: system.cpu + A0 T0 : @main+16    : sub rax, 0xc
> 5273000: system.cpu + A0 T0 : @main+16.0  :   SUB_R_I : limm   t1, 0xc :
> IntAlu :  D=0x000000000000000c
>  flags=(IsInteger|IsMicroop|IsDelayedCommit|IsFirstMicroop)
> 5273500: system.cpu + A0 T0 : @main+16.1  :   SUB_R_I : sub   rax, rax, t1
> : IntAlu :  D=0x0000000000000000
>  flags=(IsInteger|IsCC|IsMicroop|IsLastMicroop)
> 5274000: system.cpu + A0 T0 : @main+20    : pextrb DS:[rax], ax, 0
> 5274000: system.cpu + A0 T0 : @main+20.0  :   PEXTRB_M_XMM_I : limm   t0w,
> 0x755745 : IntAlu :  D=0x0000000000005745
>  flags=(IsInteger|IsMicroop|IsDelayedCommit|IsFirstMicroop)
> 5274500: system.cpu + A0 T0 : @main+20.1  :   PEXTRB_M_XMM_I : st   t0b,
> DS:[rax] : MemWrite :  D=0x0000000000000000 A=0x7fffffffed54
>  flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsLastMicroop)
> 5275500: system.cpu + A0 T0 : @main+24    : add r9b, DS:[rax + rax]
> 5275500: system.cpu + A0 T0 : @main+24.0  :   ADD_R_M : ld   t1b, DS:[rax +
> rax] : MemRead :  A=0xffffffffdaa8
>  flags=(IsInteger|IsMemRef|IsLoad|IsMicroop|IsDelayedCommit|IsFirstMicroop)
> panic: Tried to read unmapped address 0xffffffffdaa8.
>
>
> // Compile me with: g++ -static -msse4 sse_test.cpp
> #include "smmintrin.h"
> int main()  {
> asm(
> "pushq %rax \n"
> "pushq %rdx \n"
> "movq $0x0, %rdx \n"
> "movq %rsp, %rax \n"
> "subq $0xc, %rax \n"
> "pextrb $0x0,%xmm0, 0xc(%rdx,%rax,1) \n"
> "pxor %xmm0, %xmm0 \n"
> "popq %rdx \n"
> "popq %rax \n"
> );
> return 0;
> }
>
> >
> >Do you have a small test program that demonstrates the bug? If so, please
> >send it to me.
> >
> >Gabe
> >
> >On Sat, Nov 8, 2014 at 10:00 AM, Ahmed Khawaja via gem5-dev <
> >[email protected]> wrote:
> >
> > Greetings,
> >
> >        I am trying to run an SSE-enabled program through GEM5 in SE mode
> > and
> >     I get an error about "panic: Tried to read unmpapped address
> > 0x130c1c0", I
> >     disassembled my program and ran gem5 with instruction tracing. The
> >     offending instruction does NOT appear in the disassembled code. The
> >     interesting thing to note is I added an implementation of the PEXTRB
> >     instruction and the offending instruction (which shouldn't) is
> directly
> >     after the first PEXTRB instruction is executed (it only shows 1 being
> >     executed, though in the source code they only appear in sequences of
> 8
> >     PEXTRB instrucitons in a row). I am lead to believe this is an issue
> > with
> >     the front end decoder, can anyone tell me if my diagnosis seems
> correct
> > and
> >     what level of confidence should I expect in the front end decoder for
> > an
> >     instruction that was NOT implemented.
> >
> >  Thank you,
> >
> >        Ahmed Khawaja
> > _______________________________________________
> > gem5-dev mailing list
> > [email protected]
> > http://m5sim.org/mailman/listinfo/gem5-dev
> >
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to