Unfortunately it's a known limitation that x86 locked accesses don't work in the gem5 classic memory system. This fact is documented (somewhat obscurely) here: http://gem5.org/Status_Matrix.
If you'd be willing to take a stab at implementing this, that would be great. Your other options are either to switch to an ISA that uses LL/SC rather than locked accesses (like ARM or Alpha) or to switch from the classic memory system to Ruby (which does support locked accesses). Steve On Thu, Feb 7, 2013 at 9:21 PM, Jae-eon Jo <[email protected]> wrote: > Hi all, > > I'm trying to simulate X86 multicore system. > Currently, the simulator can boot with timing 16-core, make checkpoint, and > reload with detailed 16-core. > To test it, I executed `parsec -a run -p blackscholes -i simsmall -n 4`, > but never had seen it ends for 10 hours. It seems there is no actual > progress. > So, I tried several configurations with different number of cores (3, 4, 8, > 10, 12, 14, ...). None of them completed boot process but stuck at > different phase of the process. > > Further investigations have revealed what is the problem. (Configuretion: > kernel=linux-2.6.22.9 num_cores=4) > > From instruction trace near it gets stuck, I found that only one core is > alive spinning on this code: > > // (__smp_call_function:arch/x86_64/kernel/smp.c) > > // 'data.started' is initialized as 0 before entering the loop, and 'cpus > == 3'. > > while (atomic_read(&data.started) != cpus) // wait untill cpus == 3. not > different from ordinary load. > cpu_relax(); // pause: nop with spinning hint > > The trace showed that 'data.started' had increased to 2 but not to 3. > > Also I inserted 'printk' at 'atomic_read' loop and 'atomic_inc'. > m5term: > > read:0 > inc:2 > inc:1 > inc:2 > read:2 > read:2 > read:2 > read:2 > read:2 > and so on... > > instruction trace (format: CPUID:0xADDR:DISASSEMBLY): > > 2:0xffffffff80215de9: INC_LOCKED_M.mfence > 3:0xffffffff80215de2: MOV_R_P : rdip t7, %ctrl153, > 0:0xffffffff80215ddf: MFENCE > 3:0xffffffff80215de2: MOV_R_P : ld rax, DS:[t7 + 0x5fcb2f] > 2:0xffffffff80215de9: INC_LOCKED_M : ldstl t1d, DS:[rax + 0x10]:N > 2:0xffffffff80215de9: INC_LOCKED_M : addi t1d, t1d, 0x1 > 1:0xffffffff802159e2: CMP_M_R : ld t1d, DS:[rsp + 0x10]:N > 1:0xffffffff802159e2: CMP_M_R : sub t0d, t1d, ebx > 3:0xffffffff80215de9: INC_LOCKED_M.mfence > 0:0xffffffff80215de2: MOV_R_P : rdip t7, %ctrl153, > 0:0xffffffff80215de2: MOV_R_P : ld rax, DS:[t7 + 0x5fcb2f] > 1:0xffffffff802159e6: JNZ_I : rdip t1, %ctrl153, > 1:0xffffffff802159e6: JNZ_I : limm t2, 0xfffffffffffffff8 > 1:0xffffffff802159e6: JNZ_I : wrip , t1, t2 > 1:0xffffffff802159e0: NOP > 0:0xffffffff80215de9: INC_LOCKED_M.mfence > 2:0xffffffff80215de9: INC_LOCKED_M : stul t1d, DS:[rax + 0x10]:N > 2:0xffffffff80215de9: INC_LOCKED_M.mfence > 3:0xffffffff80215de9: INC_LOCKED_M : ldstl t1d, DS:[rax + 0x10]:N > 3:0xffffffff80215de9: INC_LOCKED_M : addi t1d, t1d, 0x1 > 2:0xffffffff80215ded: CALL_NEAR_I : limm t1, 0xffffffffffff243e > 2:0xffffffff80215ded: CALL_NEAR_I : rdip t7, %ctrl153, > 2:0xffffffff80215ded: CALL_NEAR_I : st t7, SS:[rsp + 0xfffffffffffffff8] > 2:0xffffffff80215ded: CALL_NEAR_I : subi rsp, rsp, 0x8 > 2:0xffffffff80215ded: CALL_NEAR_I : wrip , t7, t1 > 0:0xffffffff80215de9: INC_LOCKED_M : ldstl t1d, DS:[rax + 0x10]:N > 0:0xffffffff80215de9: INC_LOCKED_M : addi t1d, t1d, 0x1 > 1:0xffffffff802159e2: CMP_M_R : ld t1d, DS:[rsp + 0x10]:N > 1:0xffffffff802159e2: CMP_M_R : sub t0d, t1d, ebx > 1:0xffffffff802159e6: JNZ_I : rdip t1, %ctrl153, > 1:0xffffffff802159e6: JNZ_I : limm t2, 0xfffffffffffffff8 > 1:0xffffffff802159e6: JNZ_I : wrip , t1, t2 > 2:0xffffffff80208230: MOV_R_M : ld rax, GS:[0] > 3:0xffffffff80215de9: INC_LOCKED_M : stul t1d, DS:[rax + 0x10]:N > 3:0xffffffff80215de9: INC_LOCKED_M.mfence > 1:0xffffffff802159e0: NOP > 0:0xffffffff80215de9: INC_LOCKED_M : stul t1d, DS:[rax + 0x10]:N > 0:0xffffffff80215de9: INC_LOCKED_M.mfence > > disassembly of vmlinux: > > ffffffff80215de9: f0 ff 40 10 lock incl 0x10(%rax) > > > As you see, core2 did ldstl(load0)/addi(set1)/stul(store1) with no > interference. > However, before core3 did stul(store2), core0 did ldstl(load1), resulting > in stul(store2). M.mfence did not provide atomicity of the instruction, at > all. (Actually, mFence::execute(...) in timing_simple_cpu_exec.cc does > nothing) > > Is there any problem with my explanation? If not, I'll try to fix it, even > though it seems not easy for me. Any advice is welcome. > Thanks, > Jae-eon Jo. > _______________________________________________ > gem5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/gem5-dev > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
