Re: [gem5-dev] Atomic operations in X86 multicore FS has a bug

Steve Reinhardt Thu, 07 Feb 2013 21:45:38 -0800

Unfortunately it's a known limitation that x86 locked accesses don't work
in the gem5 classic memory system.  This fact is documented (somewhat
obscurely) here: http://gem5.org/Status_Matrix.


If you'd be willing to take a stab at implementing this, that would be
great.  Your other options are either to switch to an ISA that uses LL/SC
rather than locked accesses (like ARM or Alpha) or to switch from the
classic memory system to Ruby (which does support locked accesses).

Steve



On Thu, Feb 7, 2013 at 9:21 PM, Jae-eon Jo <[email protected]> wrote:

> Hi all,
>
> I'm trying to simulate X86 multicore system.
> Currently, the simulator can boot with timing 16-core, make checkpoint, and
> reload with detailed 16-core.
> To test it, I executed `parsec -a run -p blackscholes -i simsmall -n 4`,
> but never had seen it ends for 10 hours. It seems there is no actual
> progress.
> So, I tried several configurations with different number of cores (3, 4, 8,
> 10, 12, 14, ...). None of them completed boot process but stuck at
> different phase of the process.
>
> Further investigations have revealed what is the problem. (Configuretion:
> kernel=linux-2.6.22.9 num_cores=4)
>
> From instruction trace near it gets stuck, I found that only one core is
> alive spinning on this code:
>
> // (__smp_call_function:arch/x86_64/kernel/smp.c)
>
> // 'data.started' is initialized as 0 before entering the loop, and 'cpus
> == 3'.
>
> while (atomic_read(&data.started) != cpus)  // wait untill cpus == 3. not
> different from ordinary load.
>     cpu_relax();  // pause: nop with spinning hint
>
> The trace showed that 'data.started' had increased to 2 but not to 3.
>
> Also I inserted 'printk' at 'atomic_read' loop and 'atomic_inc'.
> m5term:
>
> read:0
> inc:2
> inc:1
> inc:2
> read:2
> read:2
> read:2
> read:2
> read:2
> and so on...
>
> instruction trace (format: CPUID:0xADDR:DISASSEMBLY):
>
> 2:0xffffffff80215de9:   INC_LOCKED_M.mfence
> 3:0xffffffff80215de2:  MOV_R_P : rdip   t7, %ctrl153,
> 0:0xffffffff80215ddf:  MFENCE
> 3:0xffffffff80215de2:  MOV_R_P : ld   rax, DS:[t7 + 0x5fcb2f]
> 2:0xffffffff80215de9:  INC_LOCKED_M : ldstl   t1d, DS:[rax + 0x10]:N
> 2:0xffffffff80215de9:  INC_LOCKED_M : addi   t1d, t1d, 0x1
> 1:0xffffffff802159e2:  CMP_M_R : ld   t1d, DS:[rsp + 0x10]:N
> 1:0xffffffff802159e2:  CMP_M_R : sub   t0d, t1d, ebx
> 3:0xffffffff80215de9:   INC_LOCKED_M.mfence
> 0:0xffffffff80215de2:  MOV_R_P : rdip   t7, %ctrl153,
> 0:0xffffffff80215de2:  MOV_R_P : ld   rax, DS:[t7 + 0x5fcb2f]
> 1:0xffffffff802159e6:  JNZ_I : rdip   t1, %ctrl153,
> 1:0xffffffff802159e6:  JNZ_I : limm   t2, 0xfffffffffffffff8
> 1:0xffffffff802159e6:  JNZ_I : wrip   , t1, t2
> 1:0xffffffff802159e0:  NOP
> 0:0xffffffff80215de9:   INC_LOCKED_M.mfence
> 2:0xffffffff80215de9:  INC_LOCKED_M : stul   t1d, DS:[rax + 0x10]:N
> 2:0xffffffff80215de9:   INC_LOCKED_M.mfence
> 3:0xffffffff80215de9:  INC_LOCKED_M : ldstl   t1d, DS:[rax + 0x10]:N
> 3:0xffffffff80215de9:  INC_LOCKED_M : addi   t1d, t1d, 0x1
> 2:0xffffffff80215ded:  CALL_NEAR_I : limm   t1, 0xffffffffffff243e
> 2:0xffffffff80215ded:  CALL_NEAR_I : rdip   t7, %ctrl153,
> 2:0xffffffff80215ded:  CALL_NEAR_I : st   t7, SS:[rsp + 0xfffffffffffffff8]
> 2:0xffffffff80215ded:  CALL_NEAR_I : subi   rsp, rsp, 0x8
> 2:0xffffffff80215ded:  CALL_NEAR_I : wrip   , t7, t1
> 0:0xffffffff80215de9:  INC_LOCKED_M : ldstl   t1d, DS:[rax + 0x10]:N
> 0:0xffffffff80215de9:  INC_LOCKED_M : addi   t1d, t1d, 0x1
> 1:0xffffffff802159e2:  CMP_M_R : ld   t1d, DS:[rsp + 0x10]:N
> 1:0xffffffff802159e2:  CMP_M_R : sub   t0d, t1d, ebx
> 1:0xffffffff802159e6:  JNZ_I : rdip   t1, %ctrl153,
> 1:0xffffffff802159e6:  JNZ_I : limm   t2, 0xfffffffffffffff8
> 1:0xffffffff802159e6:  JNZ_I : wrip   , t1, t2
> 2:0xffffffff80208230:  MOV_R_M : ld   rax, GS:[0]
> 3:0xffffffff80215de9:  INC_LOCKED_M : stul   t1d, DS:[rax + 0x10]:N
> 3:0xffffffff80215de9:   INC_LOCKED_M.mfence
> 1:0xffffffff802159e0:  NOP
> 0:0xffffffff80215de9:  INC_LOCKED_M : stul   t1d, DS:[rax + 0x10]:N
> 0:0xffffffff80215de9:   INC_LOCKED_M.mfence
>
> disassembly of vmlinux:
>
> ffffffff80215de9:   f0 ff 40 10             lock incl 0x10(%rax)
>
>
> As you see, core2 did ldstl(load0)/addi(set1)/stul(store1) with no
> interference.
> However, before core3 did stul(store2), core0 did ldstl(load1), resulting
> in stul(store2). M.mfence did not provide atomicity of the instruction, at
> all. (Actually, mFence::execute(...) in timing_simple_cpu_exec.cc does
> nothing)
>
> Is there any problem with my explanation? If not, I'll try to fix it, even
> though it seems not easy for me. Any advice is welcome.
> Thanks,
> Jae-eon Jo.
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Atomic operations in X86 multicore FS has a bug

Reply via email to