You're probably right as far your diagnosis and fix for the problem, but please give me a chance to verify before you check in.

The M, R and P suffixes are for the different operand types, and as I'm sure you guessed, M means memory and R means register. P means RIP relative memory reference, and is separated out because those versions need to put rip into a regular microcode register before actually doing the access. This is partially an effect of the microop ISA I chose predating 64 bit mode and RIP relative addressing.

Gabe

Quoting "Krishna, Tushar" <[email protected]>:

Hi,
I ran into what could be a bug in the implementation of the XCHG macro-op in X86. I was trying to run a binary built using m5threads through X86_SE, and observed incorrect mutex-locking behavior. Closer inspection showed this (execution trace, I have copied only the relevant micro-ops):

177896500: system.cpu05 T0 : @pthread_mutex_unlock+19.1 : MOV_M_I : st t1b, DS:[rbx + 0x4] : MemWrite : D=0x0000000000000000 A=0x695dc4
...
177900000: system.cpu06 T0 : @pthread_mutex_lock+33.0 : XCHG_M_R : ldst t1b, DS:[rdx] : MemRead : D=0x0000000000401700 A=0x695dc4 177900000: system.cpu07 T0 : @pthread_mutex_lock+33.0 : XCHG_M_R : ldst t1b, DS:[rdx] : MemRead : D=0x0000000000401700 A=0x695dc4 177900500: system.cpu07 T0 : @pthread_mutex_lock+33.1 : XCHG_M_R : st al, DS:[rdx] : MemWrite : D=0x0000000000000001 A=0x695dc4 177900500: system.cpu06 T0 : @pthread_mutex_lock+33.1 : XCHG_M_R : st al, DS:[rdx] : MemWrite : D=0x0000000000000001 A=0x695dc4
...

cpu05 does an unlock, and then *BOTH* cpu06 and cpu07 get the lock!
The reason is that both try and do a XCHG_M_R (atomic exchange between memory and register) and succeed!

The XCHG implementation (src/arch/x86/isa/insts/general_purpose/data_transfer/xchg.py)
has a comment saying
# All the memory versions need to use LOCK, regardless of if it was set

However, the descpription of XCHG_M_R is
def macroop XCHG_M_R
{
    ldst t1, seg, sib, disp
    st reg, seg, sib, disp
    mov reg, reg, t1
};

while the description of XCHG_LOCKED_M_R is
def macro-op XCHG_LOCKED_M_R
{
    ldstl t1, seg, sib, disp
    stul reg, seg, sib, disp
    mov reg, reg, t1
};


Changing XCHG_M_R to use the locked micro-ops (ldstl and stul) fixed the incorrect mutex-locking issue. I am not sure if this is a bug, or is there some reason for not using the locked micro-ops?

There are similar, unlocked micro-ops used for XCHG_R_M, XCHG_R_P and XCHG_P_R ... I am not sure if they all need to be changed to locked?
[What does the P in R_P  mean? Is this also referencing memory?].
If they all need to be changed to use ldstl and stul, I can create a patch for the same and check it in.

Thanks,
Tushar




_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to