I don't think anything happened with this since this email. This is
blocking getting timing mode working in X86_FS, and I'm not sure what to
do about it since it's really looking like a gcc bug. I would be
delighted if someone can prove me wrong, because then at least I'll have
a chance to fix it. Outside of that, how can we work around this? It
seems like a fairly deeply rooted issue and I'm not sure what I did to
hit it.

Gabe

Gabe Black wrote:
> The uncommitted patches I have are attached, and my command line is
> below if you want to try it yourself.
>
> build/X86_FS/m5.opt configs/example/fs.py --timing
> --kernel=/dist/m5/system/binaries/x86_64-vmlinux-2.6.22.9
>
> Gabe Black wrote:
>   
>> I just did. Surprisingly when I upgraded gcc to 4.3.something I forgot
>> to source /env/profile so I was still using 4.1. When I really upgraded
>> to 4.3, it still tries to execute the vtable, but since there's a
>> different address it gets an undefined opcode exception instead what I'm
>> assuming is a page fault. Also, I don't think that part is marked no
>> execute. I think the first two bytes of the address coincidentally
>> decodes to an instruction that does something, and then the bytes after
>> that decode to something that causes a page fault. That would be why the
>> page fault happens with the rip (PC) a few bytes into the address.
>> (looks at the manual) The address is 0x4e1f74, and 74 is jz with a byte
>> offset. zf wasn't set, so that'd be a noop. 0x4e is an REX prefix, and
>> the zeros afterwards I think become add (%r8), %rax. Since %r8 is zero,
>> that's dereferencing zero -> page fault -> seg fault.
>>
>> Gabe
>>
>> nathan binkert wrote:
>>   
>>     
>>> Have you tried a newer version of gcc to see what the code looks like?
>>>
>>>   Nate
>>>
>>> On Mon, Aug 24, 2009 at 10:06 PM, Gabe Black<[email protected]> wrote:
>>>   
>>>     
>>>       
>>>> This appears to be a gcc bug. I will now explain why. If you don't care,
>>>> stop reading. If you do care and you see some place where I'm wrong,
>>>> please, please let me know.
>>>>
>>>>
>>>>
>>>> The interesting part of the function in question disassembles to the
>>>> following:
>>>>
>>>> 0x0000000000d85fc3 <_ZN16SimpleTimingPort10recvTimingEP6Packet+155>:
>>>> mov    0x55ab4e(%rip),%rax        # 0x12e0b18 <curTick>
>>>> 0x0000000000d85fca <_ZN16SimpleTimingPort10recvTimingEP6Packet+162>:
>>>> mov    %rax,%rdx
>>>> 0x0000000000d85fcd <_ZN16SimpleTimingPort10recvTimingEP6Packet+165>:
>>>> add    -0x8(%rbp),%rdx
>>>> 0x0000000000d85fd1 <_ZN16SimpleTimingPort10recvTimingEP6Packet+169>:
>>>> mov    -0x20(%rbp),%rsi
>>>> 0x0000000000d85fd5 <_ZN16SimpleTimingPort10recvTimingEP6Packet+173>:
>>>> mov    -0x18(%rbp),%rdi
>>>> 0x0000000000d85fd9 <_ZN16SimpleTimingPort10recvTimingEP6Packet+177>:
>>>> callq  0xd85d68 <_ZN16SimpleTimingPort15schedSendTimingEP6Packetl>
>>>> 0x0000000000d85fde <_ZN16SimpleTimingPort10recvTimingEP6Packet+182>:
>>>> jmp    0xd85ffb <_ZN16SimpleTimingPort10recvTimingEP6Packet+211>
>>>> 0x0000000000d85fe0 <_ZN16SimpleTimingPort10recvTimingEP6Packet+184>:
>>>> cmpq   $0x0,-0x20(%rbp)
>>>> 0x0000000000d85fe5 <_ZN16SimpleTimingPort10recvTimingEP6Packet+189>:
>>>> je     0xd85ffb <_ZN16SimpleTimingPort10recvTimingEP6Packet+211>
>>>> 0x0000000000d85fe7 <_ZN16SimpleTimingPort10recvTimingEP6Packet+191>:
>>>> mov    -0x20(%rbp),%rax
>>>> 0x0000000000d85feb <_ZN16SimpleTimingPort10recvTimingEP6Packet+195>:
>>>> mov    (%rax),%rax
>>>> 0x0000000000d85fee <_ZN16SimpleTimingPort10recvTimingEP6Packet+198>:
>>>> add    $0x8,%rax
>>>> 0x0000000000d85ff2 <_ZN16SimpleTimingPort10recvTimingEP6Packet+202>:
>>>> mov    (%rax),%rax
>>>> 0x0000000000d85ff5 <_ZN16SimpleTimingPort10recvTimingEP6Packet+205>:
>>>> mov    -0x20(%rbp),%rdi
>>>> 0x0000000000d85ff9 <_ZN16SimpleTimingPort10recvTimingEP6Packet+209>:
>>>> callq  *%rax
>>>> 0x0000000000d85ffb <_ZN16SimpleTimingPort10recvTimingEP6Packet+211>:
>>>> movl   $0x1,-0x24(%rbp)
>>>> 0x0000000000d86002 <_ZN16SimpleTimingPort10recvTimingEP6Packet+218>:
>>>> mov    -0x24(%rbp),%eax
>>>> 0x0000000000d86005 <_ZN16SimpleTimingPort10recvTimingEP6Packet+221>:
>>>> leaveq
>>>> 0x0000000000d86006 <_ZN16SimpleTimingPort10recvTimingEP6Packet+222>:    
>>>> retq
>>>>
>>>> The part where it has a heart attack is at +209 where it tries to call
>>>> through the value in memory pointed to by %rax. If you look above that a
>>>> few instructions at +191, you'll see where it gets a value off of the
>>>> stack using %rbp, the frame pointer, and puts that into %rax. That value
>>>> is the pointer pkt.
>>>>
>>>> (gdb) p pkt
>>>> $7 = (PacketPtr) 0x1bd6f40
>>>> (gdb) p/x *(uint64_t)($rbp - 0x20)
>>>> $10 = 0x1bd6f40
>>>>
>>>> Because pkts are reference counting pointers, %rax actually points to a
>>>> structure that contains the pointer to the real packet. The instruction
>>>> at +202 removes that level of indirection. Next, the line at +198 adds 8
>>>> to %rax, making it point to the vtable corresponding to the Printable
>>>> base class. You can see that here after all the static members.
>>>>
>>>> (gdb) p *pkt
>>>> $11 = {<FastAlloc> = {_vptr.FastAlloc = 0x1bd7060, static Max_Alloc_Size
>>>> = 512, static Log2_Alloc_Quantum = 3, static Alloc_Quantum = 8, static
>>>> Num_Buckets = 65, static Num_Structs_Per_New = <optimized out>, static
>>>> freeLists = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2912c50, 0x0,
>>>> 0x1bcf358, 0x2b7e8f0, 0x1bd61a0,
>>>>      0x1bd6f40, 0x0 <repeats 52 times>}}, <Printable> =
>>>> {_vptr.Printable = 0xdd7d70}, static PUBLIC_FLAGS = <optimized out>,
>>>> static PRIVATE_FLAGS = <optimized out>, static COPY_FLAGS = 15, static
>>>> SHARED = 1, static EXPRESS_SNOOP = 2, static SUPPLY_EXCLUSIVE = 4,
>>>> static MEM_INHIBIT = 8, static VALID_ADDR = 256,
>>>>  static VALID_SIZE = 512, static VALID_SRC = 1024, static VALID_DST =
>>>> 2048, static STATIC_DATA = 4096, static DYNAMIC_DATA = 8192, static
>>>> ARRAY_DATA = 16384, flags = {_flags = 3840}, cmd = {static commandInfo =
>>>> 0x12e6080, cmd = MemCmd::MessageResp}, req = 0x2b7e8f0, data = 0x0, addr
>>>> = 11529215046068469760,
>>>>  size = 4, src = 0, dest = 8, origCmd = {static commandInfo =
>>>> 0x12e6080, cmd = MemCmd::MessageReq}, time = 231966339456, finishTime =
>>>> 231966444000, firstWordTime = 231966445000, static Broadcast = -1,
>>>> senderState = 0x0}
>>>>
>>>> To make sure it's pointed at the right thing,
>>>>
>>>> (gdb) p/x *(uint64_t *)((uint8_t *)pkt + 8)
>>>> $13 = 0xdd7d70
>>>>
>>>> Next, we can see that %rax is again dereferenced at +202. This is
>>>> extracting the pointer to the virtual destructor of Printable from its
>>>> vtable.
>>>>
>>>> (gdb) x/gx *(uint64_t *)((uint8_t *)pkt + 8)
>>>> 0xdd7d70 <_ZTV9Printable+16>:   0x00000000004e1f74
>>>>
>>>> (gdb) disassemble 0x00000000004e1f74
>>>> Dump of assembler code for function ~Printable:
>>>> 0x00000000004e1f74 <~Printable+0>:      push   %rbp
>>>> 0x00000000004e1f75 <~Printable+1>:      mov    %rsp,%rbp
>>>> 0x00000000004e1f78 <~Printable+4>:      sub    $0x10,%rsp
>>>> 0x00000000004e1f7c <~Printable+8>:      mov    %rdi,-0x8(%rbp)
>>>> 0x00000000004e1f80 <~Printable+12>:     mov    $0xdd7d70,%edx
>>>> 0x00000000004e1f85 <~Printable+17>:     mov    -0x8(%rbp),%rax
>>>> 0x00000000004e1f89 <~Printable+21>:     mov    %rdx,(%rax)
>>>> 0x00000000004e1f8c <~Printable+24>:     mov    $0x0,%eax
>>>> 0x00000000004e1f91 <~Printable+29>:     test   %al,%al
>>>> 0x00000000004e1f93 <~Printable+31>:     je     0x4e1f9e <~Printable+42>
>>>> 0x00000000004e1f95 <~Printable+33>:     mov    -0x8(%rbp),%rdi
>>>> 0x00000000004e1f99 <~Printable+37>:     callq  0x409340 <_zd...@plt>
>>>> 0x00000000004e1f9e <~Printable+42>:     leaveq
>>>> 0x00000000004e1f9f <~Printable+43>:     retq
>>>> End of assembler dump.
>>>>
>>>> Now %rax holds the value 0xdd7d70, the pointer to the Printable vtable
>>>> plus offset 0 which holds the pointer to the desctructor.
>>>>
>>>> (gdb) info registers
>>>> rax            0xdd7d70 14515568
>>>> rbx            0x1731f10        24321808
>>>> rcx            0x2d43c20        47463456
>>>> rdx            0xc      12
>>>> rsi            0x60     96
>>>> rdi            0x1bd6f40        29192000
>>>> rbp            0x7fff2cbc0fd0   0x7fff2cbc0fd0
>>>> rsp            0x7fff2cbc0fa0   0x7fff2cbc0fa0
>>>> r8             0x0      0
>>>> r9             0x0      0
>>>> r10            0x1bc7f30        29130544
>>>> r11            0x7fff2cbc0cf0   140733943909616
>>>> r12            0x7f5824b4ecb0   140016549686448
>>>> r13            0x1bd3f80        29179776
>>>> r14            0x1731f10        24321808
>>>> r15            0x7f58243844a0   140016541516960
>>>> rip            0xd85ffb 0xd85ffb 
>>>> <SimpleTimingPort::recvTiming(Packet*)+211>
>>>> eflags         0x10202  [ IF RF ]
>>>> cs             0x33     51
>>>> ss             0x2b     43
>>>> ds             0x0      0
>>>> es             0x0      0
>>>> fs             0x0      0
>>>> gs             0x0      0
>>>> fctrl          0x37f    895
>>>> fstat          0x0      0
>>>> ftag           0xffff   65535
>>>> fiseg          0x0      0
>>>> fioff          0x0      0
>>>> foseg          0x0      0
>>>> fooff          0x0      0
>>>> fop            0x0      0
>>>> mxcsr          0x1fa0   [ PE IM DM ZM OM UM PM ]
>>>>
>>>> The pkt pointer is then put into %rdi, I believe to act as the "this"
>>>> pointer, and the value pointed to by %rax is called.
>>>>
>>>> Almost all of this is correct so far, but this is the point where things
>>>> break.
>>>>
>>>> If we look at the encoding for the call instruction, we get the following:
>>>>
>>>> (gdb) x/3b (_ZN16SimpleTimingPort10recvTimingEP6Packet+209)
>>>> 0xd85ff9 <_ZN16SimpleTimingPort10recvTimingEP6Packet+209>:      0xff
>>>> 0xd0    0xc7
>>>>
>>>> Looking in table A-2 of AMD manual 3, we see that 0xff is the one byte
>>>> opcode that encodes a group 5 instruction. We now need to look at the
>>>> following modrm byte, 0xd0. That byte breaks down as mod=3, reg=2, and
>>>> r/m=0. Looking at table A-6, we see that a reg field of 2 encodes a CALL
>>>> instruction with an Ev argument. Looking in the operand syntax notation
>>>> key at the top of A.1, E is for a general purpose register or memory
>>>> operand specified by the ModRM byte. Looking at table A-15, we can see
>>>> that with a mod field of 3, the operand is always a register value, not
>>>> a the location pointed to by the register value.
>>>>
>>>> What that ultimately seems to mean is that gcc is using a mod value of 3
>>>> instead of, for instance, 0, and is inadvertently trying to execute the
>>>> vtable of Printable instead of the function it points to. That piece of
>>>> memory is apparently marked no execute, so the program fortunately dies
>>>> instead of going bananas. gdb is also apparently in on it too, and
>>>> disassembles the call instruction to look like it's dereferencing %rax
>>>> when it isn't.
>>>>
>>>> I would very much appreciate it if someone would explain to me why I'm
>>>> wrong since it would be much easier to fix M5 than gcc. Failing that,
>>>> hopefully somebody can get a hold of someone that can actually do
>>>> something about this.
>>>>
>>>> Gabe
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>>>
>>>>     
>>>>       
>>>>         
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>   
>>>     
>>>       
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>   
>>     
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to