https://bugzilla.kernel.org/show_bug.cgi?id=19002
Summary: Radeon rv730 AGP/KMS/DRM kernel lockup Product: Drivers Version: 2.5 Kernel Version: 2.6.36-rc5 Platform: All OS/Version: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Video(DRI - non Intel) AssignedTo: drivers_video-...@kernel-bugs.osdl.org ReportedBy: 1i5t5.dun...@cox.net CC: r...@sisk.pl, maciej.rute...@gmail.com Regression: Yes +++ This bug was initially created as a clone of Bug #17702 +++ This is a follow-on to bug #17702, which I filed, and bug #17201, which it was a dup of. I mentioned in #17201 that the fix only fixed part of my problem, getting me farther into starting X/KDE, but I still end up with a crash, now worse, as while it was an X crash before but left the kernel running, now it's a hard kernel lock. I asked if I should file another bug, or... and was told to file it, so here it is, tho it took some time to get back to it. Hardware again: Older dual-dual-core Opteron 290, AMD 8xxx chipset, Radeon hd4650/RV730 (AGP). Software: Gentoo/~amd64 Linux, xorg-server 1.9.0, xf86-video-ati 6.13.1, gcc 4.5.1, kde (also) 4.5.1. The kernel config is attached to the previous bug. The current situation (as of 2.6.36-rc5 plus 49 commits): When I start KDE, it now gets to the desktop, but, with my ordinary activity config, freezes almost immediately. I traced that freeze down to a single plasmoid, the comic-strip plasmoid. With it deleted or deconfigured so all it shows when I start kde is a configure button instead of trying to render a comic, I get a working but highly unstable X/KDE which tends to crash within a few minutes as I work with windows, etc. If I hit that configure button and load a comic, it will appear to fetch it from the net, then immediately crash as it tries to render it, same as it does when it's configured at startup. So trying to render a comic (any comic) in that plasmoid causes an immediate hard kernel lockup, but with the plasmoid disabled so it won't render a comic, the system is still very unstable and locks up within a few minutes. That's with DRI enabled in xorg.conf.d. If I uncomment the Disable "dri" line in the modules section, thus disabling DRI, I have a stable (but incredibly slow and boring) system. So it's definitely DRI related. Back on rc3 in connection with the previous bug, I reverted the commit in question (the bisected to commit), and again had a stable system. I ran it with that commit reverted, for several days without rebooting, full DRI, etc, twice. But without that revert but with the patch said to fix that bug, the system is as above, reliably crashing within a few minutes or almost immediately upon reaching the desktop if I have that plasmoid configured, if DRI is enabled. It was that way with the patch applied directly to rc3, and it's still that way with a "pure" rc5+49, today. After rc3 I ran with 44437579efca258e3c4a09f59838c8f933611990 reverted for some time, with the system stable for days. Yesterday I updated and tested pure mainline again. It still locked up, so I switched to my revert branch again. There was a single conflict in drivers/gpu/drm/radeon/r600.c. After resolving it, I built and rebooted, and that's what I'm running now. It works fine as long as that revert and conflict resolution is applied... Question: In the commit I'm reverting ( 44437579efca258e3c4a09f59838c8f933611990 ), in a couple places, there's this: if ((rdev->family >= CHIP_RV770) && (rdev->family <= CHIP_RV740)) I believe I found where the families are defined in radeon_family.h, and the order is strange, 770 < 730 < 710 < 740, which explains the seemingly reversed logic in that if, but my chip is an RV730 (both as reported by the kernel, and based on the radeon manpage table entry for an hd4650). Might it be on the wrong side of the if? It looks to me like the ELSE is identical to the previous (working) behavior, so maybe my RV730 should be falling thru to the ELSE? Otherwise... maybe they corrected the bug in the later production runs, or perhaps in the AGP bridge (if such would be possible) since I think it's native PCIE and requires one? Is there a simple test I could run to see if that bug really does apply, and/or some serial/batch/revision number that could be used to distinguish between runs with and without the bug? Because hardware bug or not, it sure seems like on my hardware it was working fine as it was, and now we're just screwing things up. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. ------------------------------------------------------------------------------ Nokia and AT&T present the 2010 Calling All Innovators-North America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokia-dev2dev -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel