John Dennis wrote:
[Note: this is cross posted between dri-devel and [EMAIL PROTECTED] ]

I'm trying to debug a hung X server problem with DRI using the radeon
driver. Sources are XFree86 4.3.0. This happens to be on ia64, but at
the moment I don't see anything architecture specific about the problem.

The symptom of the problem is the following message from the drm
radeon kernel driver:

[drm:radeon_lock_take] *ERROR* x holds heavyweight lock

where x is a context id. I've tracked the sequence of events down to
the following:

DRIFinishScreenInit is called during the radeon driver initialization,
inside DRIFinishScreenInit is the following code snippet:

    /* Now that we have created the X server's context, we can grab the
     * hardware lock for the X server.
     */
    DRILock(pScreen, 0);
    pDRIPriv->grabbedDRILock = TRUE;

Slightly later on RADEONAdjustFrame is called and it does the following:

#ifdef XF86DRI
    if (info->CPStarted) DRILock(pScrn->pScreen, 0);
#endif

Its this DRILock which is causing the "*ERROR* x holds heavyweight
lock" message. The reason is both DRIFinishScreenInit and
RADEONAdjustFrame are executing in the server and using the servers
DRI lock. DRIFinishScreenInit never unlocks, it sets the
grabbedDRILock flag, big deal, no one ever references this flag. When
RADEONAdjustFrame calls DRILock its already locked because
DRIFinishScreenInit locked and never unlocked. The dri kernel driver
on the second lock call then suspends the X server process
(DRM(lock_take) returns zero to DRM(lock) because the context holding
the lock and context requesting the lock are the same, this then
causes DRM(lock) to put the X server on the lock wait queue). Putting
the X server on the wait queue waiting for the lock to be released
then deadlocks the X server because its the process holding the lock
on its context.

Questions:

The whole crux of the problem seems to me the taking and holding of
the lock in DRIFinishScreenInit. Why is this being done?

It is done because the X server expects to be holding the lock whenever it is between the Wakeup & Block handlers. The odd man out in this case is when the server first starts up, it won't have aquired the lock coming in through the Wakeup handler. So, it gets aquired at this point.


The rest of the DRI initialization code needs to be holding the lock, so we have to grab it somewhere.

The problem seems to be that RADEONAdjustFrame() is designed to be called from cursor handling routines that are executed outside the Wakeup/Block handlers (perhaps this came in with SilkenMouse?) but is being called during initialization after the point the lock is grabbed.

I haven't deeply investigated this but two solutions spring to mind:
- Hack: Move the call to RADEONAdjustFrame() during initialization to before the lock is grabbed.
- Better: Replace the call to RADEONAdjustFrame() during initialization with something like:


    if (info->FBDev) {
        fbdevHWAdjustFrame(scrnIndex, x, y, flags);
    } else {
        RADEONDoAdjustFrame(pScrn, x, y, FALSE);
    }

which is basically what RADEONAdjustFrame() wraps.

Keith

_______________________________________________
Devel mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/devel

Reply via email to