https://bugzilla.kernel.org/show_bug.cgi?id=19002

           Summary: Radeon rv730 AGP/KMS/DRM kernel lockup
           Product: Drivers
           Version: 2.5
    Kernel Version: 2.6.36-rc5
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
        AssignedTo: drivers_video-...@kernel-bugs.osdl.org
        ReportedBy: 1i5t5.dun...@cox.net
                CC: r...@sisk.pl, maciej.rute...@gmail.com
        Regression: Yes


+++ This bug was initially created as a clone of Bug #17702 +++

This is a follow-on to bug #17702, which I filed, and bug #17201, which it was
a dup of.  I mentioned in #17201 that the fix only fixed part of my problem,
getting me farther into starting X/KDE, but I still end up with a crash, now
worse, as while it was an X crash before but left the kernel running, now it's
a hard kernel lock.  I asked if I should file another bug, or... and was told
to file it, so here it is, tho it took some time to get back to it.

Hardware again:  Older dual-dual-core Opteron 290, AMD 8xxx chipset, Radeon
hd4650/RV730 (AGP).

Software: Gentoo/~amd64 Linux, xorg-server 1.9.0, xf86-video-ati 6.13.1, gcc
4.5.1, kde (also) 4.5.1.

The kernel config is attached to the previous bug.

The current situation (as of 2.6.36-rc5 plus 49 commits):

When I start KDE, it now gets to the desktop, but, with my ordinary activity
config, freezes almost immediately.  I traced that freeze down to a single
plasmoid, the comic-strip plasmoid.  With it deleted or deconfigured so all it
shows when I start kde is a configure button instead of trying to render a
comic, I get a working but highly unstable X/KDE which tends to crash within a
few minutes as I work with windows, etc.  If I hit that configure button and
load a comic, it will appear to fetch it from the net, then immediately crash
as it tries to render it, same as it does when it's configured at startup.  So
trying to render a comic (any comic) in that plasmoid causes an immediate hard
kernel lockup, but with the plasmoid disabled so it won't render a comic, the
system is still very unstable and locks up within a few minutes.

That's with DRI enabled in xorg.conf.d.  If I uncomment the Disable "dri" line
in the modules section, thus disabling DRI, I have a stable (but incredibly
slow and boring) system.  So it's definitely DRI related.

Back on rc3 in connection with the previous bug, I reverted the commit in
question (the bisected to commit), and again had a stable system.  I ran it
with that commit reverted, for several days without rebooting, full DRI, etc,
twice.  But without that revert but with the patch said to fix that bug, the
system is as above, reliably crashing within a few minutes or almost
immediately upon reaching the desktop if I have that plasmoid configured, if
DRI is enabled.  It was that way with the patch applied directly to rc3, and
it's still that way with a "pure" rc5+49, today.

After rc3 I ran with 44437579efca258e3c4a09f59838c8f933611990 reverted for some
time, with the system stable for days.  Yesterday I updated and tested pure
mainline again.  It still locked up, so I switched to my revert branch again. 
There was a single conflict in drivers/gpu/drm/radeon/r600.c.  After resolving
it, I built and rebooted, and that's what I'm running now.  It works fine as
long as that revert and conflict resolution is applied...

Question:  In the commit I'm reverting (
44437579efca258e3c4a09f59838c8f933611990 ), in a couple places, there's this:

if ((rdev->family >= CHIP_RV770) && (rdev->family <= CHIP_RV740)) 

I believe I found where the families are defined in radeon_family.h, and the
order is strange, 770 < 730 < 710 < 740, which explains the seemingly reversed
logic in that if, but my chip is an RV730 (both as reported by the kernel, and
based on the radeon manpage table entry for an hd4650).  Might it be on the
wrong side of the if?  It looks to me like the ELSE is identical to the
previous (working) behavior, so maybe my RV730 should be falling thru to the
ELSE?

Otherwise... maybe they corrected the bug in the later production runs, or
perhaps in the AGP bridge (if such would be possible) since I think it's native
PCIE and requires one?  Is there a simple test I could run to see if that bug
really does apply, and/or some serial/batch/revision number that could be used
to distinguish between runs with and without the bug?

Because hardware bug or not, it sure seems like on my hardware it was working
fine as it was, and now we're just screwing things up.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

------------------------------------------------------------------------------
Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store 
http://p.sf.net/sfu/nokia-dev2dev
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to