The locking problem is solved, my original analysis was incorrect. The
problem was that DRM_CAS was not correctly implemented on IA64. Thus
this was an IA64 issue only, this is consistent with others who showed
up in a google search describing the problem, all were on IA64.

I have filed an XFree86 bug report on this. I could not find a DRI
specific bug reporting mechanism other than the dri-devel list.

The IA64 implementation of CAS was this:

#define DRM_CAS(lock,old,new,__ret)                                       \
        do {                                                              \
                unsigned int __result, __old = (old);                     \
                __asm__ __volatile__(                                     \
                        "mf\n"                                            \
                        "mov ar.ccv=%2\n"                                 \
                        ";;\n"                                            \
                        "cmpxchg4.acq %0=%1,%3,ar.ccv"                    \
                        : "=r" (__result), "=m" (__drm_dummy_lock(lock))  \
                        : "r" (__old), "r" (new)                          \
                        : "memory");                                      \
                __ret = (__result) != (__old);                            \
        } while (0)

The problem was with the data types given to the cmpxchg4
instruction. All of the lock types in DRM are int's and on IA64 thats
4 bytes wide. The digit suffix cmpxchg4 signifies this instruction
operates on a 4 byte quantity. One might expect then since this
instruction operates on 4 byte values and in DRM the locks are 4 bytes
everything is fine, but it isn't.

The cmpxchg4 instruction operates this way: 

cmpxchg4 r1=[r3],r2,ar.ccv

4 bytes are read at the address pointed to by r3, that 32 bit value is
then zero extended to 64 bits. The 64 bit value is then compared to
the 64 bit value stored in appliation register CCV. If the two 64 bit
values are equal then the least significant 4 bytes in r2 are written
back to the address pointed to by r3. The original value pointed to by
r3 is stored in r1. The entire operation is atomic.

The mistake in the DRM_CAS implemenation is that the comparison is 64
bits wide, thus the value stored in ar.ccv (%2 in the asm) must be 64
bits wide and for us that means zero extending the 32 bit "old"
parameter to 64 bits.

Because of the way GCC asm blocks work to tie C variables and data
types to asm values the promotion of "old" from unsigned int to
unsigned long was not happening. Thus when "old" was stored into
ar.ccv its most significant 32 bits contained garbage. (Actually
because of the way GCC generates constants it turns out the upper 32
bits was 0xffffffff, this was from the OR of DRM_LOCK_HELD which is
defined as 0x80000000, but the compiler generates a 64 bit OR
operation using the immediate value 0xffffffff80000000, which is legal
because the upper 32 bits are undefined on int (32 bit) operations).

The bottom line is that the test would fail when it shouldn't because
the high 32 bits in ar.ccv were not zero.

One might think that because "old" was assigned to __old in a local
block which was unsigned int the compiler would know enough when using
this value in the asm to have zero extended it. But that's not true,
in asm blocks its critical to define the asm value correctly so the
compiler can translate between the C code variable and what the asm
code is referring to.

The line:

                        : "r" (__old), "r" (new)

says %2 is mapped by ""r" (__old)", in other words put __old in a
general 64 bit register. We've told the compiler to put 64 bits of
"__old" into a register, but __old is a 32 bit value with its high
order 32 bits undefined. We need to tell the compiler to widen the
type when assigning it to a general register, thus the asm template
type definition needs to be modified with a cast to unsigned long.

                        : "r" ((unsigned long)__old), "r" (new)

Only with this will the compiler know to widen the 32 bit __old value
to 64 bits inside the asm code.

Thanks to Jakub Jelinek who helped me understand the nuances of GCC
asm templates and type conversions.

As a minor side note, definitions of bit flags should be tagged as
unsigned. Thus things like:

#define DRM_LOCK_HELD  0x80000000
#define DRM_LOCK_CONT  0x40000000

should really be:

#define DRM_LOCK_HELD  0x80000000U
#define DRM_LOCK_CONT  0x40000000U

John


_______________________________________________
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel

Reply via email to