[Note: this is cross posted between dri-devel and [EMAIL PROTECTED] ]

I'm trying to debug a hung X server problem with DRI using the radeon
driver. Sources are XFree86 4.3.0. This happens to be on ia64, but at
the moment I don't see anything architecture specific about the problem.

The symptom of the problem is the following message from the drm
radeon kernel driver:

[drm:radeon_lock_take] *ERROR* x holds heavyweight lock

where x is a context id. I've tracked the sequence of events down to
the following:

DRIFinishScreenInit is called during the radeon driver initialization,
inside DRIFinishScreenInit is the following code snippet:

    /* Now that we have created the X server's context, we can grab the
     * hardware lock for the X server.
     */
    DRILock(pScreen, 0);
    pDRIPriv->grabbedDRILock = TRUE;

Slightly later on RADEONAdjustFrame is called and it does the following:

#ifdef XF86DRI
    if (info->CPStarted) DRILock(pScrn->pScreen, 0);
#endif

Its this DRILock which is causing the "*ERROR* x holds heavyweight
lock" message. The reason is both DRIFinishScreenInit and
RADEONAdjustFrame are executing in the server and using the servers
DRI lock. DRIFinishScreenInit never unlocks, it sets the
grabbedDRILock flag, big deal, no one ever references this flag. When
RADEONAdjustFrame calls DRILock its already locked because
DRIFinishScreenInit locked and never unlocked. The dri kernel driver
on the second lock call then suspends the X server process
(DRM(lock_take) returns zero to DRM(lock) because the context holding
the lock and context requesting the lock are the same, this then
causes DRM(lock) to put the X server on the lock wait queue). Putting
the X server on the wait queue waiting for the lock to be released
then deadlocks the X server because its the process holding the lock
on its context.

Questions:

The whole crux of the problem seems to me the taking and holding of
the lock in DRIFinishScreenInit. Why is this being done? I can't see a
reason for it. Why does it set a flag indicating its holding the lock
if nobody examines that flag?

Is suspending a process that already holds a lock during a lock request
really the right behavior? Granted, a process thats trying to lock twice
without an intervening unlock is broken, but do we really want to
deadlock that process?

Any other insights to this issue?

FWIW, I googled for this error and came up with several folks who
starting around last spring started seeing the same problem, but none
of the mail threads had a follow up solution.

Thanks,

John




-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to