The locking problem is solved, my original analysis was incorrect. The problem was that DRM_CAS was not correctly implemented on IA64. Thus this was an IA64 issue only, this is consistent with others who showed up in a google search describing the problem, all were on IA64.
I have filed an XFree86 bug report on this. I could not find a DRI specific bug reporting mechanism other than the dri-devel list. The IA64 implementation of CAS was this: #define DRM_CAS(lock,old,new,__ret) \ do { \ unsigned int __result, __old = (old); \ __asm__ __volatile__( \ "mf\n" \ "mov ar.ccv=%2\n" \ ";;\n" \ "cmpxchg4.acq %0=%1,%3,ar.ccv" \ : "=r" (__result), "=m" (__drm_dummy_lock(lock)) \ : "r" (__old), "r" (new) \ : "memory"); \ __ret = (__result) != (__old); \ } while (0) The problem was with the data types given to the cmpxchg4 instruction. All of the lock types in DRM are int's and on IA64 thats 4 bytes wide. The digit suffix cmpxchg4 signifies this instruction operates on a 4 byte quantity. One might expect then since this instruction operates on 4 byte values and in DRM the locks are 4 bytes everything is fine, but it isn't. The cmpxchg4 instruction operates this way: cmpxchg4 r1=[r3],r2,ar.ccv 4 bytes are read at the address pointed to by r3, that 32 bit value is then zero extended to 64 bits. The 64 bit value is then compared to the 64 bit value stored in appliation register CCV. If the two 64 bit values are equal then the least significant 4 bytes in r2 are written back to the address pointed to by r3. The original value pointed to by r3 is stored in r1. The entire operation is atomic. The mistake in the DRM_CAS implemenation is that the comparison is 64 bits wide, thus the value stored in ar.ccv (%2 in the asm) must be 64 bits wide and for us that means zero extending the 32 bit "old" parameter to 64 bits. Because of the way GCC asm blocks work to tie C variables and data types to asm values the promotion of "old" from unsigned int to unsigned long was not happening. Thus when "old" was stored into ar.ccv its most significant 32 bits contained garbage. (Actually because of the way GCC generates constants it turns out the upper 32 bits was 0xffffffff, this was from the OR of DRM_LOCK_HELD which is defined as 0x80000000, but the compiler generates a 64 bit OR operation using the immediate value 0xffffffff80000000, which is legal because the upper 32 bits are undefined on int (32 bit) operations). The bottom line is that the test would fail when it shouldn't because the high 32 bits in ar.ccv were not zero. One might think that because "old" was assigned to __old in a local block which was unsigned int the compiler would know enough when using this value in the asm to have zero extended it. But that's not true, in asm blocks its critical to define the asm value correctly so the compiler can translate between the C code variable and what the asm code is referring to. The line: : "r" (__old), "r" (new) says %2 is mapped by ""r" (__old)", in other words put __old in a general 64 bit register. We've told the compiler to put 64 bits of "__old" into a register, but __old is a 32 bit value with its high order 32 bits undefined. We need to tell the compiler to widen the type when assigning it to a general register, thus the asm template type definition needs to be modified with a cast to unsigned long. : "r" ((unsigned long)__old), "r" (new) Only with this will the compiler know to widen the 32 bit __old value to 64 bits inside the asm code. Thanks to Jakub Jelinek who helped me understand the nuances of GCC asm templates and type conversions. As a minor side note, definitions of bit flags should be tagged as unsigned. Thus things like: #define DRM_LOCK_HELD 0x80000000 #define DRM_LOCK_CONT 0x40000000 should really be: #define DRM_LOCK_HELD 0x80000000U #define DRM_LOCK_CONT 0x40000000U John _______________________________________________ Devel mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/devel