Now that I come to think of it, the subsequent instructions following the system call should have been flushed and re-executed after the system call was emulated.
On Tue, Oct 11, 2011 at 1:18 PM, Min Kyu Jeong <[email protected]> wrote: > Hi, all > > In SE O3 CPU mode, I am experiencing a memory ordering issue between load > and system call. > Expected scenario is the system call is executed (emulated) and puts data > on memory (syscall read), then the load consumes it. > However, somehow the load (memory read) reaches the memory ahead of the > execution of the system call, and fetches the old value. > > The following is the gem5 trace showing this behavior. > > 34690000: system.cpu.dcache: ReadReq 87000 miss > 34692000: system.coretol2buses: recvTiming: src 2 dst -1 ReadReq 0x87000 > 34692000: system.l2: ReadReq 87000 miss > 34692000: system.coretol2buses: The bus is now occupied from tick 34692000 > to 34693000 > 34693000: system.cpu.icache: ReadReq (ifetch) 5200 hit > 34704000: system.membus: recvTiming: src 2 dst -1 ReadReq 0x87000 > 34704000: system.physmem-port0: recvTiming: ReadReq 0x87000 > 34704000: system.physmem: enq: 0 ReadReq 0x87000 > 34704000: system.physmem: Read of size 64 on address 0x87000 > 34704000: system.physmem: 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 > 34704000: system.physmem: 00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 > 34704000: system.physmem: 00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 > 34704000: system.physmem: 00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 > ... > 34751000: system.cpu: syscall read called w/arguments 0,8192,1073750016,3 > 34751000: system.cpu.dcache: functional WriteReq 87000 > 34751000: system.coretol2buses: recvFunctional: packet src 2 dest 0 addr > 0x87000 cmd WriteReq > 34751000: system.cpu.icache: functional WriteReq 87000 > 34751000: system.l2: functional WriteReq 87000 > 34751000: system.membus: recvFunctional: packet src 2 dest 1 addr 0x87000 > cmd WriteReq > 34751000: system.physmem-port0: recvFunctional: WriteReq 0x87000 > 34751000: system.physmem: Write of size 64 on address 0x87000 > 34751000: system.physmem: 00000000 31 38 30 30 20 32 37 39 30 20 0a 32 32 > 34 20 32 1800 2790 224 2 > 34751000: system.physmem: 00000010 32 38 0a 32 32 34 20 32 35 37 0a 32 32 > 36 20 32 28 224 257 226 2 > 34751000: system.physmem: 00000020 34 36 0a 32 33 30 20 32 35 34 0a 32 33 > 31 20 32 46 230 254 231 2 > 34751000: system.physmem: 00000030 34 31 0a 32 33 32 20 32 36 36 0a 32 33 > 35 20 32 41 232 266 235 2 > > The following is the -Exec traces. (@__lib_read+24:svc is the producer > system call and @_IO_new_file_underflow+293 is the consumer load). So the > producer-consumer relation is there in the program order. > > system.cpu T0 : @__libc_read+24 : svc : IntAlu : > system.cpu T0 : @__libc_read+28 : mov r7, r12 : IntAlu : > D=0x0000000000000000 > system.cpu T0 : @__libc_read+32 : cmns r0, #4096 : IntAlu : > D=0x0000000000000000 > system.cpu T0 : @__libc_read+36 : bxcc : IntAlu : > system.cpu T0 : @_IO_file_read+17 : add sp, sp, #8 : IntAlu > : D=0x00000000befffbf8 > system.cpu T0 : @_IO_file_read+19.0 : addi_uop r34, sp, #0 : IntAlu > : D=0x00000000befffbf8 > system.cpu T0 : @_IO_file_read+19.1 : ldr_uop r6, [r34, #0] : MemRead > : D=0x0000000000000000 A=0xbefffbf8 > system.cpu T0 : @_IO_file_read+19.2 : ldr_uop r35, [r34, #4] : MemRead > : D=0x000000000000d1df A=0xbefffbfc > system.cpu T0 : @_IO_file_read+19.3 : addi_uop sp, sp, #8 : IntAlu > : D=0x00000000befffc00 > system.cpu T0 : @_IO_file_read+19.4 : uopReg_uop pc, r35 : IntAlu > : D=0x000000000000d1df > system.cpu T0 : @_IO_new_file_underflow+261 : cmps r0, #0 > : IntAlu : D=0x0000000000000001 > system.cpu T0 : @_IO_new_file_underflow+263 : b > : IntAlu : Predicated False > system.cpu T0 : @_IO_new_file_underflow+265 : ldr r1, [r5, #8] > : MemRead : D=0x0000000040002000 A=0x717d0 > system.cpu T0 : @_IO_new_file_underflow+267 : ldrd.w r2, r3, [r5, > #80] : MemRead : D=0x00000000ffffffff A=0x71818 > system.cpu T0 : @_IO_new_file_underflow+271 : adds r1, r1, r0 > : IntAlu : D=0x0000000000000000 > system.cpu T0 : @_IO_new_file_underflow+273 : str r1, [r5, #8] > : MemWrite : D=0x0000000040004000 A=0x717d0 > system.cpu T0 : @_IO_new_file_underflow+275 : cmps.w r2, #4294967295 > : IntAlu : D=0x0000000000000001 > system.cpu T0 : @_IO_new_file_underflow+279 : b > : IntAlu : > system.cpu T0 : @_IO_new_file_underflow+355 : cmps.w r3, #4294967295 > : IntAlu : D=0x0000000000000001 > system.cpu T0 : @_IO_new_file_underflow+359 : b > : IntAlu : Predicated False > system.cpu T0 : @_IO_new_file_underflow+361 : b > : IntAlu : > system.cpu T0 : @_IO_new_file_underflow+291 : ldr r3, [r5, #4] > : MemRead : D=0x0000000040002000 A=0x717cc > system.cpu T0 : @_IO_new_file_underflow+293 : ldrb r0, [r3, #0] > : MemRead : D=0x0000000000000000 A=0x40002000 > > With timing CPU, they are executed in-order and everything is fine. It > appears that the system call is emulated at the head of the ROB, while the > load goes ahead. I would imagine there would be a mechanism that either 1. > suppress the load when there's older system call 2. or check the loaded > value later and squash if it has changed. (Haven't checked the relevant code > yet). Any idea why they are not working, or where should I look at to find > it out? > > Thanks, > > Min > > > > > >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
