On 2021-06-04 3:08 p.m., Michel Dänzer wrote: > On 2021-06-04 2:33 p.m., Alex Deucher wrote: >> On Fri, Jun 4, 2021 at 3:47 AM Michel Dänzer <mic...@daenzer.net> wrote: >>> >>> On 2021-05-19 3:57 p.m., Alex Deucher wrote: >>>> On Wed, May 19, 2021 at 4:48 AM Michel Dänzer <mic...@daenzer.net> wrote: >>>>> >>>>> On 2021-05-19 12:05 a.m., Alex Deucher wrote: >>>>>> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer <mic...@daenzer.net> >>>>>> wrote: >>>>>>> >>>>>>> On 2021-05-17 11:33 a.m., xgqt wrote: >>>>>>>> Hello! >>>>>>>> >>>>>>>> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 >>>>>>>> 3500U with Radeon Vega 8 Graphics. >>>>>>>> Recently some breakages started happening for me. In about 1h after >>>>>>>> boot-up while using a KDE desktop machine GUI would freeze. Sometimes >>>>>>>> it would be possible to move the mouse but the rest will be frozen. >>>>>>>> Screen may start blinking or go black. >>>>>>>> >>>>>>>> I'm not sure if this is my kernel, firmware or the hardware. >>>>>>>> I don't understands dmesg that's why I'm guessing, but I think it is >>>>>>>> the firmware since this behavior started around 2021-05-15. >>>>>>>> From my Portage logs I see that I updated my firmware on 2021-05-14 at >>>>>>>> 18:16:06. >>>>>>>> So breakages started with my kernel: 5.10.27 and FW: 20210511. >>>>>>>> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. >>>>>>>> I didn't notice a breakage on 5.4.97 but system ran ~40 minutes. >>>>>>>> So I booted to newly compiled 5.12.4 where I was ~1h and it broke. >>>>>>>> After that I booted to 5.4.97 again and downgraded my FW. >>>>>>>> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315. >>>>>>>> >>>>>>>> I also described my situation on the Gentoo bugzilla: >>>>>>>> https://bugs.gentoo.org/790566 >>>>>>>> >>>>>>>> "dmesg.log" attached here is from the time machine run fine (at the >>>>>>>> moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log >>>>>>>> from the time system broke >>>>>>>> >>>>>>>> Can I get any help with this? What are the next steps I should take? >>>>>>>> Any other files I should provide? >>>>>>> >>>>>>> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / >>>>>>> Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting >>>>>>> them to be firware related. The hangs occurred with firmware from the >>>>>>> AMD 20.50 release. I'm currently running with firmware from the 20.40 >>>>>>> release, no hang in almost 2 weeks (the hangs happened within 1-2 days >>>>>>> after boot). >>>>>> >>>>>> Can you narrow down which firmware(s) cause the problem? >>>>> >>>>> I'll try, but note I'm not really sure yet my hangs were related to >>>>> firmware (only). Anyway, I'll try narrowing it down. >>>> >>>> Thanks. Does this patch help? >>>> https://patchwork.freedesktop.org/patch/433701/ >>> >>> Unfortunately not. After no hangs for two weeks with older firmware, I just >>> got a hang again within a day with newer firmware and a kernel with this >>> fix. >>> >>> >>> I'll try and narrow down which firmware triggers it now. Does Picasso use >>> the picasso_*.bin ones only, or others as well? >> >> The picasso ones and raven_dmcu.bin. > > Thanks. raven_dmcu.bin hasn't changed, so I'm trying to bisect the 8 Picasso > ones which have changed: > > picasso_asd.bin > picasso_ce.bin > picasso_me.bin > picasso_mec2.bin > picasso_mec.bin > picasso_pfp.bin > picasso_sdma.bin > picasso_vcn.bin
Things are pointing to picasso_sdma.bin. I'm currently running with only that one reverted to linux-firmware 20210315, and haven't got any hangs for a week. Note that I've previously gone for a week without a hang even with firmware which had hung before. So there's still a small chance that I'm just on another lucky run. That said, Pierre-Eric has also homed in on raven_sdma.bin for similar hangs, and reverting to older firmware seems to have helped multiple people on bug reports. So, I think it makes sense for you guys to start looking for what could be going wrong with the Picasso/Raven SDMA firmware from 20.50. One thing I noticed is that the SDMA firmware from 20.50 advertises the same feature version, but a *lower* firmware version than the one from 18.50. So it might be worth double-checking that there wasn't an accidental downgrade to some older version. -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx