You're probably right as far your diagnosis and fix for the problem,
but please give me a chance to verify before you check in.
The M, R and P suffixes are for the different operand types, and as
I'm sure you guessed, M means memory and R means register. P means RIP
relative memory reference, and is separated out because those versions
need to put rip into a regular microcode register before actually
doing the access. This is partially an effect of the microop ISA I
chose predating 64 bit mode and RIP relative addressing.
Gabe
Quoting "Krishna, Tushar" <[email protected]>:
Hi,
I ran into what could be a bug in the implementation of the XCHG
macro-op in X86.
I was trying to run a binary built using m5threads through X86_SE,
and observed incorrect mutex-locking behavior.
Closer inspection showed this (execution trace, I have copied only
the relevant micro-ops):
177896500: system.cpu05 T0 : @pthread_mutex_unlock+19.1 : MOV_M_I
: st t1b, DS:[rbx + 0x4] : MemWrite : D=0x0000000000000000
A=0x695dc4
...
177900000: system.cpu06 T0 : @pthread_mutex_lock+33.0 : XCHG_M_R
: ldst t1b, DS:[rdx] : MemRead : D=0x0000000000401700 A=0x695dc4
177900000: system.cpu07 T0 : @pthread_mutex_lock+33.0 : XCHG_M_R
: ldst t1b, DS:[rdx] : MemRead : D=0x0000000000401700 A=0x695dc4
177900500: system.cpu07 T0 : @pthread_mutex_lock+33.1 : XCHG_M_R
: st al, DS:[rdx] : MemWrite : D=0x0000000000000001 A=0x695dc4
177900500: system.cpu06 T0 : @pthread_mutex_lock+33.1 : XCHG_M_R
: st al, DS:[rdx] : MemWrite : D=0x0000000000000001 A=0x695dc4
...
cpu05 does an unlock, and then *BOTH* cpu06 and cpu07 get the lock!
The reason is that both try and do a XCHG_M_R (atomic exchange
between memory and register) and succeed!
The XCHG implementation
(src/arch/x86/isa/insts/general_purpose/data_transfer/xchg.py)
has a comment saying
# All the memory versions need to use LOCK, regardless of if it was set
However, the descpription of XCHG_M_R is
def macroop XCHG_M_R
{
ldst t1, seg, sib, disp
st reg, seg, sib, disp
mov reg, reg, t1
};
while the description of XCHG_LOCKED_M_R is
def macro-op XCHG_LOCKED_M_R
{
ldstl t1, seg, sib, disp
stul reg, seg, sib, disp
mov reg, reg, t1
};
Changing XCHG_M_R to use the locked micro-ops (ldstl and stul) fixed
the incorrect mutex-locking issue.
I am not sure if this is a bug, or is there some reason for not
using the locked micro-ops?
There are similar, unlocked micro-ops used for XCHG_R_M, XCHG_R_P
and XCHG_P_R ... I am not sure if they all need to be changed to
locked?
[What does the P in R_P mean? Is this also referencing memory?].
If they all need to be changed to use ldstl and stul, I can create a
patch for the same and check it in.
Thanks,
Tushar
_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev