On 2021-06-04 3:08 p.m., Michel Dänzer wrote:
> On 2021-06-04 2:33 p.m., Alex Deucher wrote:
>> On Fri, Jun 4, 2021 at 3:47 AM Michel Dänzer <mic...@daenzer.net> wrote:
>>>
>>> On 2021-05-19 3:57 p.m., Alex Deucher wrote:
>>>> On Wed, May 19, 2021 at 4:48 AM Michel Dänzer <mic...@daenzer.net> wrote:
>>>>>
>>>>> On 2021-05-19 12:05 a.m., Alex Deucher wrote:
>>>>>> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer <mic...@daenzer.net> 
>>>>>> wrote:
>>>>>>>
>>>>>>> On 2021-05-17 11:33 a.m., xgqt wrote:
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 
>>>>>>>> 3500U with Radeon Vega 8 Graphics.
>>>>>>>> Recently some breakages started happening for me. In about 1h after 
>>>>>>>> boot-up while using a KDE desktop machine GUI would freeze. Sometimes 
>>>>>>>> it would be possible to move the mouse but the rest will be frozen. 
>>>>>>>> Screen may start blinking or go black.
>>>>>>>>
>>>>>>>> I'm not sure if this is my kernel, firmware or the hardware.
>>>>>>>> I don't understands dmesg that's why I'm guessing, but I think it is 
>>>>>>>> the firmware since this behavior started around 2021-05-15.
>>>>>>>> From my Portage logs I see that I updated my firmware on 2021-05-14 at 
>>>>>>>> 18:16:06.
>>>>>>>> So breakages started with my kernel: 5.10.27 and FW: 20210511.
>>>>>>>> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. 
>>>>>>>> I didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
>>>>>>>> So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
>>>>>>>> After that I booted to 5.4.97 again and downgraded my FW.
>>>>>>>> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
>>>>>>>>
>>>>>>>> I also described my situation on the Gentoo bugzilla: 
>>>>>>>> https://bugs.gentoo.org/790566
>>>>>>>>
>>>>>>>> "dmesg.log" attached here is from the time machine run fine (at the 
>>>>>>>> moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log 
>>>>>>>> from the time system broke
>>>>>>>>
>>>>>>>> Can I get any help with this? What are the next steps I should take? 
>>>>>>>> Any other files I should provide?
>>>>>>>
>>>>>>> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / 
>>>>>>> Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting 
>>>>>>> them to be firware related. The hangs occurred with firmware from the 
>>>>>>> AMD 20.50 release. I'm currently running with firmware from the 20.40 
>>>>>>> release, no hang in almost 2 weeks (the hangs happened within 1-2 days 
>>>>>>> after boot).
>>>>>>
>>>>>> Can you narrow down which firmware(s) cause the problem?
>>>>>
>>>>> I'll try, but note I'm not really sure yet my hangs were related to 
>>>>> firmware (only). Anyway, I'll try narrowing it down.
>>>>
>>>> Thanks.  Does this patch help?
>>>> https://patchwork.freedesktop.org/patch/433701/
>>>
>>> Unfortunately not. After no hangs for two weeks with older firmware, I just 
>>> got a hang again within a day with newer firmware and a kernel with this 
>>> fix.
>>>
>>>
>>> I'll try and narrow down which firmware triggers it now. Does Picasso use 
>>> the picasso_*.bin ones only, or others as well?
>>
>> The picasso ones and raven_dmcu.bin.
> 
> Thanks. raven_dmcu.bin hasn't changed, so I'm trying to bisect the 8 Picasso 
> ones which have changed:
> 
> picasso_asd.bin
> picasso_ce.bin
> picasso_me.bin
> picasso_mec2.bin
> picasso_mec.bin
> picasso_pfp.bin
> picasso_sdma.bin
> picasso_vcn.bin

Things are pointing to picasso_sdma.bin. I'm currently running with only that 
one reverted to linux-firmware 20210315, and haven't got any hangs for a week.

Note that I've previously gone for a week without a hang even with firmware 
which had hung before. So there's still a small chance that I'm just on another 
lucky run.

That said, Pierre-Eric has also homed in on raven_sdma.bin for similar hangs, 
and reverting to older firmware seems to have helped multiple people on bug 
reports.

So, I think it makes sense for you guys to start looking for what could be 
going wrong with the Picasso/Raven SDMA firmware from 20.50. One thing I 
noticed is that the SDMA firmware from 20.50 advertises the same feature 
version, but a *lower* firmware version than the one from 18.50. So it might be 
worth double-checking that there wasn't an accidental downgrade to some older 
version.


-- 
Earthling Michel Dänzer               |               https://redhat.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to