Hi, all

In SE O3 CPU mode, I am experiencing a memory ordering issue between load
and system call.
Expected scenario is the system call is executed (emulated) and puts data on
memory (syscall read), then the load consumes it.
However, somehow the load (memory read) reaches the memory ahead of the
execution of the system call, and fetches the old value.

The following is the gem5 trace showing this behavior.

34690000: system.cpu.dcache: ReadReq 87000 miss
34692000: system.coretol2buses: recvTiming: src 2 dst -1 ReadReq 0x87000
34692000: system.l2: ReadReq 87000 miss
34692000: system.coretol2buses: The bus is now occupied from tick 34692000
to 34693000
34693000: system.cpu.icache: ReadReq (ifetch) 5200 hit
34704000: system.membus: recvTiming: src 2 dst -1 ReadReq 0x87000
34704000: system.physmem-port0: recvTiming: ReadReq 0x87000
34704000: system.physmem: enq: 0 ReadReq 0x87000
34704000: system.physmem: Read of size 64 on address 0x87000
34704000: system.physmem: 00000000  00 00 00 00 00 00 00 00  00 00 00 00 00
00 00 00
34704000: system.physmem: 00000010  00 00 00 00 00 00 00 00  00 00 00 00 00
00 00 00
34704000: system.physmem: 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00
00 00 00
34704000: system.physmem: 00000030  00 00 00 00 00 00 00 00  00 00 00 00 00
00 00 00
...
34751000: system.cpu: syscall read called w/arguments 0,8192,1073750016,3
34751000: system.cpu.dcache: functional WriteReq 87000
34751000: system.coretol2buses: recvFunctional: packet src 2 dest 0 addr
0x87000 cmd WriteReq
34751000: system.cpu.icache: functional WriteReq 87000
34751000: system.l2: functional WriteReq 87000
34751000: system.membus: recvFunctional: packet src 2 dest 1 addr 0x87000
cmd WriteReq
34751000: system.physmem-port0: recvFunctional: WriteReq 0x87000
34751000: system.physmem: Write of size 64 on address 0x87000
34751000: system.physmem: 00000000  31 38 30 30 20 32 37 39  30 20 0a 32 32
34 20 32   1800 2790  224 2
34751000: system.physmem: 00000010  32 38 0a 32 32 34 20 32  35 37 0a 32 32
36 20 32   28 224 257 226 2
34751000: system.physmem: 00000020  34 36 0a 32 33 30 20 32  35 34 0a 32 33
31 20 32   46 230 254 231 2
34751000: system.physmem: 00000030  34 31 0a 32 33 32 20 32  36 36 0a 32 33
35 20 32   41 232 266 235 2

The following is the -Exec traces. (@__lib_read+24:svc is the producer
system call and @_IO_new_file_underflow+293 is the consumer load). So the
producer-consumer relation is there in the program order.

system.cpu T0 : @__libc_read+24    :   svc                      : IntAlu :
system.cpu T0 : @__libc_read+28    :   mov   r7, r12            : IntAlu :
 D=0x0000000000000000
system.cpu T0 : @__libc_read+32    :   cmns   r0, #4096         : IntAlu :
 D=0x0000000000000000
system.cpu T0 : @__libc_read+36    :   bxcc                     : IntAlu :
system.cpu T0 : @_IO_file_read+17    :   add   sp, sp, #8         : IntAlu :
 D=0x00000000befffbf8
system.cpu T0 : @_IO_file_read+19.0  :   addi_uop   r34, sp, #0   : IntAlu :
 D=0x00000000befffbf8
system.cpu T0 : @_IO_file_read+19.1  :   ldr_uop   r6, [r34, #0]  : MemRead
:  D=0x0000000000000000 A=0xbefffbf8
system.cpu T0 : @_IO_file_read+19.2  :   ldr_uop   r35, [r34, #4] : MemRead
:  D=0x000000000000d1df A=0xbefffbfc
system.cpu T0 : @_IO_file_read+19.3  :   addi_uop   sp, sp, #8    : IntAlu :
 D=0x00000000befffc00
system.cpu T0 : @_IO_file_read+19.4  :   uopReg_uop   pc, r35     : IntAlu :
 D=0x000000000000d1df
system.cpu T0 : @_IO_new_file_underflow+261    :   cmps   r0, #0
 : IntAlu :  D=0x0000000000000001
system.cpu T0 : @_IO_new_file_underflow+263    :   b
 : IntAlu : Predicated False
system.cpu T0 : @_IO_new_file_underflow+265    :   ldr   r1, [r5, #8]
: MemRead :  D=0x0000000040002000 A=0x717d0
system.cpu T0 : @_IO_new_file_underflow+267    :   ldrd.w   r2, r3, [r5,
#80] : MemRead :  D=0x00000000ffffffff A=0x71818
system.cpu T0 : @_IO_new_file_underflow+271    :   adds   r1, r1, r0
 : IntAlu :  D=0x0000000000000000
system.cpu T0 : @_IO_new_file_underflow+273    :   str   r1, [r5, #8]
: MemWrite :  D=0x0000000040004000 A=0x717d0
system.cpu T0 : @_IO_new_file_underflow+275    :   cmps.w   r2, #4294967295
: IntAlu :  D=0x0000000000000001
system.cpu T0 : @_IO_new_file_underflow+279    :   b
 : IntAlu :
system.cpu T0 : @_IO_new_file_underflow+355    :   cmps.w   r3, #4294967295
: IntAlu :  D=0x0000000000000001
system.cpu T0 : @_IO_new_file_underflow+359    :   b
 : IntAlu : Predicated False
system.cpu T0 : @_IO_new_file_underflow+361    :   b
 : IntAlu :
system.cpu T0 : @_IO_new_file_underflow+291    :   ldr   r3, [r5, #4]
: MemRead :  D=0x0000000040002000 A=0x717cc
system.cpu T0 : @_IO_new_file_underflow+293    :   ldrb   r0, [r3, #0]
 : MemRead :  D=0x0000000000000000 A=0x40002000

With timing CPU, they are executed in-order and everything is fine. It
appears that the system call is emulated at the head of the ROB, while the
load goes ahead. I would imagine there would be a mechanism that either 1.
suppress the load when there's older system call 2. or check the loaded
value later and squash if it has changed. (Haven't checked the relevant code
yet). Any idea why they are not working, or where should I look at to find
it out?

Thanks,

Min
_______________________________________________
gem5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to