[Note: this is cross posted between dri-devel and [EMAIL PROTECTED] ] I'm trying to debug a hung X server problem with DRI using the radeon driver. Sources are XFree86 4.3.0. This happens to be on ia64, but at the moment I don't see anything architecture specific about the problem.
The symptom of the problem is the following message from the drm radeon kernel driver: [drm:radeon_lock_take] *ERROR* x holds heavyweight lock where x is a context id. I've tracked the sequence of events down to the following: DRIFinishScreenInit is called during the radeon driver initialization, inside DRIFinishScreenInit is the following code snippet: /* Now that we have created the X server's context, we can grab the * hardware lock for the X server. */ DRILock(pScreen, 0); pDRIPriv->grabbedDRILock = TRUE; Slightly later on RADEONAdjustFrame is called and it does the following: #ifdef XF86DRI if (info->CPStarted) DRILock(pScrn->pScreen, 0); #endif Its this DRILock which is causing the "*ERROR* x holds heavyweight lock" message. The reason is both DRIFinishScreenInit and RADEONAdjustFrame are executing in the server and using the servers DRI lock. DRIFinishScreenInit never unlocks, it sets the grabbedDRILock flag, big deal, no one ever references this flag. When RADEONAdjustFrame calls DRILock its already locked because DRIFinishScreenInit locked and never unlocked. The dri kernel driver on the second lock call then suspends the X server process (DRM(lock_take) returns zero to DRM(lock) because the context holding the lock and context requesting the lock are the same, this then causes DRM(lock) to put the X server on the lock wait queue). Putting the X server on the wait queue waiting for the lock to be released then deadlocks the X server because its the process holding the lock on its context. Questions: The whole crux of the problem seems to me the taking and holding of the lock in DRIFinishScreenInit. Why is this being done? I can't see a reason for it. Why does it set a flag indicating its holding the lock if nobody examines that flag? Is suspending a process that already holds a lock during a lock request really the right behavior? Granted, a process thats trying to lock twice without an intervening unlock is broken, but do we really want to deadlock that process? Any other insights to this issue? FWIW, I googled for this error and came up with several folks who starting around last spring started seeing the same problem, but none of the mail threads had a follow up solution. Thanks, John ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel