This appears to be a gcc bug. I will now explain why. If you don't care,
stop reading. If you do care and you see some place where I'm wrong,
please, please let me know.



The interesting part of the function in question disassembles to the
following:

0x0000000000d85fc3 <_ZN16SimpleTimingPort10recvTimingEP6Packet+155>:   
mov    0x55ab4e(%rip),%rax        # 0x12e0b18 <curTick>
0x0000000000d85fca <_ZN16SimpleTimingPort10recvTimingEP6Packet+162>:   
mov    %rax,%rdx
0x0000000000d85fcd <_ZN16SimpleTimingPort10recvTimingEP6Packet+165>:   
add    -0x8(%rbp),%rdx
0x0000000000d85fd1 <_ZN16SimpleTimingPort10recvTimingEP6Packet+169>:   
mov    -0x20(%rbp),%rsi
0x0000000000d85fd5 <_ZN16SimpleTimingPort10recvTimingEP6Packet+173>:   
mov    -0x18(%rbp),%rdi
0x0000000000d85fd9 <_ZN16SimpleTimingPort10recvTimingEP6Packet+177>:   
callq  0xd85d68 <_ZN16SimpleTimingPort15schedSendTimingEP6Packetl>
0x0000000000d85fde <_ZN16SimpleTimingPort10recvTimingEP6Packet+182>:   
jmp    0xd85ffb <_ZN16SimpleTimingPort10recvTimingEP6Packet+211>
0x0000000000d85fe0 <_ZN16SimpleTimingPort10recvTimingEP6Packet+184>:   
cmpq   $0x0,-0x20(%rbp)
0x0000000000d85fe5 <_ZN16SimpleTimingPort10recvTimingEP6Packet+189>:   
je     0xd85ffb <_ZN16SimpleTimingPort10recvTimingEP6Packet+211>
0x0000000000d85fe7 <_ZN16SimpleTimingPort10recvTimingEP6Packet+191>:   
mov    -0x20(%rbp),%rax
0x0000000000d85feb <_ZN16SimpleTimingPort10recvTimingEP6Packet+195>:   
mov    (%rax),%rax
0x0000000000d85fee <_ZN16SimpleTimingPort10recvTimingEP6Packet+198>:   
add    $0x8,%rax
0x0000000000d85ff2 <_ZN16SimpleTimingPort10recvTimingEP6Packet+202>:   
mov    (%rax),%rax
0x0000000000d85ff5 <_ZN16SimpleTimingPort10recvTimingEP6Packet+205>:   
mov    -0x20(%rbp),%rdi
0x0000000000d85ff9 <_ZN16SimpleTimingPort10recvTimingEP6Packet+209>:   
callq  *%rax
0x0000000000d85ffb <_ZN16SimpleTimingPort10recvTimingEP6Packet+211>:   
movl   $0x1,-0x24(%rbp)
0x0000000000d86002 <_ZN16SimpleTimingPort10recvTimingEP6Packet+218>:   
mov    -0x24(%rbp),%eax
0x0000000000d86005 <_ZN16SimpleTimingPort10recvTimingEP6Packet+221>:   
leaveq
0x0000000000d86006 <_ZN16SimpleTimingPort10recvTimingEP6Packet+222>:    retq

The part where it has a heart attack is at +209 where it tries to call
through the value in memory pointed to by %rax. If you look above that a
few instructions at +191, you'll see where it gets a value off of the
stack using %rbp, the frame pointer, and puts that into %rax. That value
is the pointer pkt.

(gdb) p pkt
$7 = (PacketPtr) 0x1bd6f40
(gdb) p/x *(uint64_t)($rbp - 0x20)
$10 = 0x1bd6f40

Because pkts are reference counting pointers, %rax actually points to a
structure that contains the pointer to the real packet. The instruction
at +202 removes that level of indirection. Next, the line at +198 adds 8
to %rax, making it point to the vtable corresponding to the Printable
base class. You can see that here after all the static members.

(gdb) p *pkt
$11 = {<FastAlloc> = {_vptr.FastAlloc = 0x1bd7060, static Max_Alloc_Size
= 512, static Log2_Alloc_Quantum = 3, static Alloc_Quantum = 8, static
Num_Buckets = 65, static Num_Structs_Per_New = <optimized out>, static
freeLists = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2912c50, 0x0,
0x1bcf358, 0x2b7e8f0, 0x1bd61a0,
      0x1bd6f40, 0x0 <repeats 52 times>}}, <Printable> =
{_vptr.Printable = 0xdd7d70}, static PUBLIC_FLAGS = <optimized out>,
static PRIVATE_FLAGS = <optimized out>, static COPY_FLAGS = 15, static
SHARED = 1, static EXPRESS_SNOOP = 2, static SUPPLY_EXCLUSIVE = 4,
static MEM_INHIBIT = 8, static VALID_ADDR = 256,
  static VALID_SIZE = 512, static VALID_SRC = 1024, static VALID_DST =
2048, static STATIC_DATA = 4096, static DYNAMIC_DATA = 8192, static
ARRAY_DATA = 16384, flags = {_flags = 3840}, cmd = {static commandInfo =
0x12e6080, cmd = MemCmd::MessageResp}, req = 0x2b7e8f0, data = 0x0, addr
= 11529215046068469760,
  size = 4, src = 0, dest = 8, origCmd = {static commandInfo =
0x12e6080, cmd = MemCmd::MessageReq}, time = 231966339456, finishTime =
231966444000, firstWordTime = 231966445000, static Broadcast = -1,
senderState = 0x0}

To make sure it's pointed at the right thing,

(gdb) p/x *(uint64_t *)((uint8_t *)pkt + 8)
$13 = 0xdd7d70

Next, we can see that %rax is again dereferenced at +202. This is
extracting the pointer to the virtual destructor of Printable from its
vtable.

(gdb) x/gx *(uint64_t *)((uint8_t *)pkt + 8)
0xdd7d70 <_ZTV9Printable+16>:   0x00000000004e1f74

(gdb) disassemble 0x00000000004e1f74
Dump of assembler code for function ~Printable:
0x00000000004e1f74 <~Printable+0>:      push   %rbp
0x00000000004e1f75 <~Printable+1>:      mov    %rsp,%rbp
0x00000000004e1f78 <~Printable+4>:      sub    $0x10,%rsp
0x00000000004e1f7c <~Printable+8>:      mov    %rdi,-0x8(%rbp)
0x00000000004e1f80 <~Printable+12>:     mov    $0xdd7d70,%edx
0x00000000004e1f85 <~Printable+17>:     mov    -0x8(%rbp),%rax
0x00000000004e1f89 <~Printable+21>:     mov    %rdx,(%rax)
0x00000000004e1f8c <~Printable+24>:     mov    $0x0,%eax
0x00000000004e1f91 <~Printable+29>:     test   %al,%al
0x00000000004e1f93 <~Printable+31>:     je     0x4e1f9e <~Printable+42>
0x00000000004e1f95 <~Printable+33>:     mov    -0x8(%rbp),%rdi
0x00000000004e1f99 <~Printable+37>:     callq  0x409340 <_zd...@plt>
0x00000000004e1f9e <~Printable+42>:     leaveq
0x00000000004e1f9f <~Printable+43>:     retq
End of assembler dump.

Now %rax holds the value 0xdd7d70, the pointer to the Printable vtable
plus offset 0 which holds the pointer to the desctructor.

(gdb) info registers
rax            0xdd7d70 14515568
rbx            0x1731f10        24321808
rcx            0x2d43c20        47463456
rdx            0xc      12
rsi            0x60     96
rdi            0x1bd6f40        29192000
rbp            0x7fff2cbc0fd0   0x7fff2cbc0fd0
rsp            0x7fff2cbc0fa0   0x7fff2cbc0fa0
r8             0x0      0
r9             0x0      0
r10            0x1bc7f30        29130544
r11            0x7fff2cbc0cf0   140733943909616
r12            0x7f5824b4ecb0   140016549686448
r13            0x1bd3f80        29179776
r14            0x1731f10        24321808
r15            0x7f58243844a0   140016541516960
rip            0xd85ffb 0xd85ffb <SimpleTimingPort::recvTiming(Packet*)+211>
eflags         0x10202  [ IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
fctrl          0x37f    895
fstat          0x0      0
ftag           0xffff   65535
fiseg          0x0      0
fioff          0x0      0
foseg          0x0      0
fooff          0x0      0
fop            0x0      0
mxcsr          0x1fa0   [ PE IM DM ZM OM UM PM ]

The pkt pointer is then put into %rdi, I believe to act as the "this"
pointer, and the value pointed to by %rax is called.

Almost all of this is correct so far, but this is the point where things
break.

If we look at the encoding for the call instruction, we get the following:

(gdb) x/3b (_ZN16SimpleTimingPort10recvTimingEP6Packet+209)
0xd85ff9 <_ZN16SimpleTimingPort10recvTimingEP6Packet+209>:      0xff   
0xd0    0xc7

Looking in table A-2 of AMD manual 3, we see that 0xff is the one byte
opcode that encodes a group 5 instruction. We now need to look at the
following modrm byte, 0xd0. That byte breaks down as mod=3, reg=2, and
r/m=0. Looking at table A-6, we see that a reg field of 2 encodes a CALL
instruction with an Ev argument. Looking in the operand syntax notation
key at the top of A.1, E is for a general purpose register or memory
operand specified by the ModRM byte. Looking at table A-15, we can see
that with a mod field of 3, the operand is always a register value, not
a the location pointed to by the register value.

What that ultimately seems to mean is that gcc is using a mod value of 3
instead of, for instance, 0, and is inadvertently trying to execute the
vtable of Printable instead of the function it points to. That piece of
memory is apparently marked no execute, so the program fortunately dies
instead of going bananas. gdb is also apparently in on it too, and
disassembles the call instruction to look like it's dereferencing %rax
when it isn't.

I would very much appreciate it if someone would explain to me why I'm
wrong since it would be much easier to fix M5 than gcc. Failing that,
hopefully somebody can get a hold of someone that can actually do
something about this.

Gabe
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to