Hi, all In SE O3 CPU mode, I am experiencing a memory ordering issue between load and system call. Expected scenario is the system call is executed (emulated) and puts data on memory (syscall read), then the load consumes it. However, somehow the load (memory read) reaches the memory ahead of the execution of the system call, and fetches the old value.
The following is the gem5 trace showing this behavior. 34690000: system.cpu.dcache: ReadReq 87000 miss 34692000: system.coretol2buses: recvTiming: src 2 dst -1 ReadReq 0x87000 34692000: system.l2: ReadReq 87000 miss 34692000: system.coretol2buses: The bus is now occupied from tick 34692000 to 34693000 34693000: system.cpu.icache: ReadReq (ifetch) 5200 hit 34704000: system.membus: recvTiming: src 2 dst -1 ReadReq 0x87000 34704000: system.physmem-port0: recvTiming: ReadReq 0x87000 34704000: system.physmem: enq: 0 ReadReq 0x87000 34704000: system.physmem: Read of size 64 on address 0x87000 34704000: system.physmem: 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 34704000: system.physmem: 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 34704000: system.physmem: 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 34704000: system.physmem: 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 34751000: system.cpu: syscall read called w/arguments 0,8192,1073750016,3 34751000: system.cpu.dcache: functional WriteReq 87000 34751000: system.coretol2buses: recvFunctional: packet src 2 dest 0 addr 0x87000 cmd WriteReq 34751000: system.cpu.icache: functional WriteReq 87000 34751000: system.l2: functional WriteReq 87000 34751000: system.membus: recvFunctional: packet src 2 dest 1 addr 0x87000 cmd WriteReq 34751000: system.physmem-port0: recvFunctional: WriteReq 0x87000 34751000: system.physmem: Write of size 64 on address 0x87000 34751000: system.physmem: 00000000 31 38 30 30 20 32 37 39 30 20 0a 32 32 34 20 32 1800 2790 224 2 34751000: system.physmem: 00000010 32 38 0a 32 32 34 20 32 35 37 0a 32 32 36 20 32 28 224 257 226 2 34751000: system.physmem: 00000020 34 36 0a 32 33 30 20 32 35 34 0a 32 33 31 20 32 46 230 254 231 2 34751000: system.physmem: 00000030 34 31 0a 32 33 32 20 32 36 36 0a 32 33 35 20 32 41 232 266 235 2 The following is the -Exec traces. (@__lib_read+24:svc is the producer system call and @_IO_new_file_underflow+293 is the consumer load). So the producer-consumer relation is there in the program order. system.cpu T0 : @__libc_read+24 : svc : IntAlu : system.cpu T0 : @__libc_read+28 : mov r7, r12 : IntAlu : D=0x0000000000000000 system.cpu T0 : @__libc_read+32 : cmns r0, #4096 : IntAlu : D=0x0000000000000000 system.cpu T0 : @__libc_read+36 : bxcc : IntAlu : system.cpu T0 : @_IO_file_read+17 : add sp, sp, #8 : IntAlu : D=0x00000000befffbf8 system.cpu T0 : @_IO_file_read+19.0 : addi_uop r34, sp, #0 : IntAlu : D=0x00000000befffbf8 system.cpu T0 : @_IO_file_read+19.1 : ldr_uop r6, [r34, #0] : MemRead : D=0x0000000000000000 A=0xbefffbf8 system.cpu T0 : @_IO_file_read+19.2 : ldr_uop r35, [r34, #4] : MemRead : D=0x000000000000d1df A=0xbefffbfc system.cpu T0 : @_IO_file_read+19.3 : addi_uop sp, sp, #8 : IntAlu : D=0x00000000befffc00 system.cpu T0 : @_IO_file_read+19.4 : uopReg_uop pc, r35 : IntAlu : D=0x000000000000d1df system.cpu T0 : @_IO_new_file_underflow+261 : cmps r0, #0 : IntAlu : D=0x0000000000000001 system.cpu T0 : @_IO_new_file_underflow+263 : b : IntAlu : Predicated False system.cpu T0 : @_IO_new_file_underflow+265 : ldr r1, [r5, #8] : MemRead : D=0x0000000040002000 A=0x717d0 system.cpu T0 : @_IO_new_file_underflow+267 : ldrd.w r2, r3, [r5, #80] : MemRead : D=0x00000000ffffffff A=0x71818 system.cpu T0 : @_IO_new_file_underflow+271 : adds r1, r1, r0 : IntAlu : D=0x0000000000000000 system.cpu T0 : @_IO_new_file_underflow+273 : str r1, [r5, #8] : MemWrite : D=0x0000000040004000 A=0x717d0 system.cpu T0 : @_IO_new_file_underflow+275 : cmps.w r2, #4294967295 : IntAlu : D=0x0000000000000001 system.cpu T0 : @_IO_new_file_underflow+279 : b : IntAlu : system.cpu T0 : @_IO_new_file_underflow+355 : cmps.w r3, #4294967295 : IntAlu : D=0x0000000000000001 system.cpu T0 : @_IO_new_file_underflow+359 : b : IntAlu : Predicated False system.cpu T0 : @_IO_new_file_underflow+361 : b : IntAlu : system.cpu T0 : @_IO_new_file_underflow+291 : ldr r3, [r5, #4] : MemRead : D=0x0000000040002000 A=0x717cc system.cpu T0 : @_IO_new_file_underflow+293 : ldrb r0, [r3, #0] : MemRead : D=0x0000000000000000 A=0x40002000 With timing CPU, they are executed in-order and everything is fine. It appears that the system call is emulated at the head of the ROB, while the load goes ahead. I would imagine there would be a mechanism that either 1. suppress the load when there's older system call 2. or check the loaded value later and squash if it has changed. (Haven't checked the relevant code yet). Any idea why they are not working, or where should I look at to find it out? Thanks, Min
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
