So, basically what's happening is that the pextrb instruction has a 3 byte opcode. This is written about in more detail somewhere, but basically as bytes flow out of memory and are decoded, they go through two main steps.
First, they go through the predecoder which turns them from a stream of bytes (of undetermined length) into a standard representation (ExtMachInst) that also has all the context the instruction was decoded in that would affect what those bytes would mean. Second, that object is sent through a decoder which actually maps that into a StaticInst class which has properties that describe the instruction, a function to actually do what the instruction does, etc. That mapping is cached, so there must be a consistent, one to one mapping of ExtMachInsts and StaticInsts. The predecoder (step one) was written with a nod to three byte opcodes, but it doesn't really support them. For instance, it uses tables in src/arch/x86/decoder_tables.cc to figure out various things about an instruction, and there are only two versions of those tables, one for one byte opcodes and one for two byte opcodes. Three byte opcodes will, I believe, index outside of the bounds of the array and read something arbitrary. Because those tables determine some of what's in an instruction and how big things like immediates are, the size of the pextrb instruction is being miscalculated and it then tries to execute in the middle of an instruction. To properly support three byte opcodes (or even four byte opcodes!) the tables would need to be expanded to have information about them. Unfortunately, that also introduces a complication. Within a certain, fairly robust approximation, the relevant properties of one and two byte instructions can be determined just based on how many bytes they have and what the actual live opcode is at the end, ignoring escape bytes. There are a couple escape bytes for the three byte opcodes though. Somebody would have to verify that all three byte opcodes with a certain live byte, regardless of prefix, have the same properties, or we'd have to modify how the predecoder works to take that into account. One thing to keep in mind is that the predecoder processes every single byte decoded for execution with no caching. Making it slower even by a small amount could have a meaningful impact on over all performance. Gabe On Mon, Nov 10, 2014 at 3:16 PM, Ahmed Khawaja via gem5-dev < [email protected]> wrote: > Greetings, > > I was able to successfully reproduce the error using a stand > alone micro-program (which I have attached) and the source code is listed > below. The program runs fine (doesn't do anything really) stand alone, but > crashes in gem5 with the same signature of a random ADD instruction > appearing after the PEXTRB instruction. Two things to note, in the > committed version of GEM5 PEXTRB is not implemented but it SHOULD still be > decoded properly. The program works fine if I change the PEXTRB instruction > to reference a register instead of memory, it appears the bug becomes an > issue when the destination is a memory operand. The instruction listed > below is a copy of what was in my original program. Furthermore, I tried > running it the with immediate byte != 0 and the program did NOT crash in > GEM5. Any help would be greatly appreciated! > > > // Trace Error > > 5269000: system.cpu + A0 T0 : @main+4 : push rax > 5269000: system.cpu + A0 T0 : @main+4.0 : PUSH_R : st rax, SS:[rsp + > 0xfffffffffffffff8] : MemWrite : D=0x000000000040105e A=0x7fffffffed68 > > flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsDelayedCommit|IsFirstMicroop) > 5269500: system.cpu + A0 T0 : @main+4.1 : PUSH_R : subi rsp, rsp, 0x8 > : IntAlu : D=0x00007fffffffed68 flags=(IsInteger|IsMicroop|IsLastMicroop) > 5270000: system.cpu + A0 T0 : @main+5 : push rdx > 5270000: system.cpu + A0 T0 : @main+5.0 : PUSH_R : st rdx, SS:[rsp + > 0xfffffffffffffff8] : MemWrite : D=0x00007fffffffee68 A=0x7fffffffed60 > > flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsDelayedCommit|IsFirstMicroop) > 5270500: system.cpu + A0 T0 : @main+5.1 : PUSH_R : subi rsp, rsp, 0x8 > : IntAlu : D=0x00007fffffffed60 flags=(IsInteger|IsMicroop|IsLastMicroop) > 5271500: system.cpu + A0 T0 : @main+6 : mov rdi, 0 > 5271500: system.cpu + A0 T0 : @main+6.0 : MOV_R_I : limm rdx, 0 : > IntAlu : D=0x0000000000000000 > flags=(IsInteger|IsMicroop|IsLastMicroop|IsFirstMicroop) > 5272000: system.cpu + A0 T0 : @main+13 : mov rax, rsp > 5272000: system.cpu + A0 T0 : @main+13.0 : MOV_R_R : mov rax, rax, rsp > : IntAlu : D=0x00007fffffffed60 > flags=(IsInteger|IsMicroop|IsLastMicroop|IsFirstMicroop) > 5273000: system.cpu + A0 T0 : @main+16 : sub rax, 0xc > 5273000: system.cpu + A0 T0 : @main+16.0 : SUB_R_I : limm t1, 0xc : > IntAlu : D=0x000000000000000c > flags=(IsInteger|IsMicroop|IsDelayedCommit|IsFirstMicroop) > 5273500: system.cpu + A0 T0 : @main+16.1 : SUB_R_I : sub rax, rax, t1 > : IntAlu : D=0x0000000000000000 > flags=(IsInteger|IsCC|IsMicroop|IsLastMicroop) > 5274000: system.cpu + A0 T0 : @main+20 : pextrb DS:[rax], ax, 0 > 5274000: system.cpu + A0 T0 : @main+20.0 : PEXTRB_M_XMM_I : limm t0w, > 0x755745 : IntAlu : D=0x0000000000005745 > flags=(IsInteger|IsMicroop|IsDelayedCommit|IsFirstMicroop) > 5274500: system.cpu + A0 T0 : @main+20.1 : PEXTRB_M_XMM_I : st t0b, > DS:[rax] : MemWrite : D=0x0000000000000000 A=0x7fffffffed54 > flags=(IsInteger|IsMemRef|IsStore|IsMicroop|IsLastMicroop) > 5275500: system.cpu + A0 T0 : @main+24 : add r9b, DS:[rax + rax] > 5275500: system.cpu + A0 T0 : @main+24.0 : ADD_R_M : ld t1b, DS:[rax + > rax] : MemRead : A=0xffffffffdaa8 > flags=(IsInteger|IsMemRef|IsLoad|IsMicroop|IsDelayedCommit|IsFirstMicroop) > panic: Tried to read unmapped address 0xffffffffdaa8. > > > // Compile me with: g++ -static -msse4 sse_test.cpp > #include "smmintrin.h" > int main() { > asm( > "pushq %rax \n" > "pushq %rdx \n" > "movq $0x0, %rdx \n" > "movq %rsp, %rax \n" > "subq $0xc, %rax \n" > "pextrb $0x0,%xmm0, 0xc(%rdx,%rax,1) \n" > "pxor %xmm0, %xmm0 \n" > "popq %rdx \n" > "popq %rax \n" > ); > return 0; > } > > > > >Do you have a small test program that demonstrates the bug? If so, please > >send it to me. > > > >Gabe > > > >On Sat, Nov 8, 2014 at 10:00 AM, Ahmed Khawaja via gem5-dev < > >[email protected]> wrote: > > > > Greetings, > > > > I am trying to run an SSE-enabled program through GEM5 in SE mode > > and > > I get an error about "panic: Tried to read unmpapped address > > 0x130c1c0", I > > disassembled my program and ran gem5 with instruction tracing. The > > offending instruction does NOT appear in the disassembled code. The > > interesting thing to note is I added an implementation of the PEXTRB > > instruction and the offending instruction (which shouldn't) is > directly > > after the first PEXTRB instruction is executed (it only shows 1 being > > executed, though in the source code they only appear in sequences of > 8 > > PEXTRB instrucitons in a row). I am lead to believe this is an issue > > with > > the front end decoder, can anyone tell me if my diagnosis seems > correct > > and > > what level of confidence should I expect in the front end decoder for > > an > > instruction that was NOT implemented. > > > > Thank you, > > > > Ahmed Khawaja > > _______________________________________________ > > gem5-dev mailing list > > [email protected] > > http://m5sim.org/mailman/listinfo/gem5-dev > > > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
