[Public]

Thanks for narrowing this down.  There is new PCO SDMA firmware available 
(attached).  Can you try it?

Thanks,

Alex
________________________________
From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of Michel 
Dänzer <mic...@daenzer.net>
Sent: Thursday, June 24, 2021 6:51 AM
To: Alex Deucher <alexdeuc...@gmail.com>
Cc: xgqt <x...@riseup.net>; amd-gfx list <amd-gfx@lists.freedesktop.org>
Subject: Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* 
Waiting for fences timed out!"

On 2021-06-04 3:08 p.m., Michel Dänzer wrote:
> On 2021-06-04 2:33 p.m., Alex Deucher wrote:
>> On Fri, Jun 4, 2021 at 3:47 AM Michel Dänzer <mic...@daenzer.net> wrote:
>>>
>>> On 2021-05-19 3:57 p.m., Alex Deucher wrote:
>>>> On Wed, May 19, 2021 at 4:48 AM Michel Dänzer <mic...@daenzer.net> wrote:
>>>>>
>>>>> On 2021-05-19 12:05 a.m., Alex Deucher wrote:
>>>>>> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer <mic...@daenzer.net> 
>>>>>> wrote:
>>>>>>>
>>>>>>> On 2021-05-17 11:33 a.m., xgqt wrote:
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 
>>>>>>>> 3500U with Radeon Vega 8 Graphics.
>>>>>>>> Recently some breakages started happening for me. In about 1h after 
>>>>>>>> boot-up while using a KDE desktop machine GUI would freeze. Sometimes 
>>>>>>>> it would be possible to move the mouse but the rest will be frozen. 
>>>>>>>> Screen may start blinking or go black.
>>>>>>>>
>>>>>>>> I'm not sure if this is my kernel, firmware or the hardware.
>>>>>>>> I don't understands dmesg that's why I'm guessing, but I think it is 
>>>>>>>> the firmware since this behavior started around 2021-05-15.
>>>>>>>> From my Portage logs I see that I updated my firmware on 2021-05-14 at 
>>>>>>>> 18:16:06.
>>>>>>>> So breakages started with my kernel: 5.10.27 and FW: 20210511.
>>>>>>>> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. 
>>>>>>>> I didn't notice a breakage on 5.4.97 but system ran ~40 minutes.
>>>>>>>> So I booted to newly compiled 5.12.4 where I was ~1h and it broke.
>>>>>>>> After that I booted to 5.4.97 again and downgraded my FW.
>>>>>>>> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315.
>>>>>>>>
>>>>>>>> I also described my situation on the Gentoo bugzilla: 
>>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.gentoo.org%2F790566&amp;data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843342891%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=5HKZUabvEZWI%2BzQUBBPWl3Cpiy7Zjs%2BqaKa4XZyNK1g%3D&amp;reserved=0
>>>>>>>>
>>>>>>>> "dmesg.log" attached here is from the time machine run fine (at the 
>>>>>>>> moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log 
>>>>>>>> from the time system broke
>>>>>>>>
>>>>>>>> Can I get any help with this? What are the next steps I should take? 
>>>>>>>> Any other files I should provide?
>>>>>>>
>>>>>>> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / 
>>>>>>> Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting 
>>>>>>> them to be firware related. The hangs occurred with firmware from the 
>>>>>>> AMD 20.50 release. I'm currently running with firmware from the 20.40 
>>>>>>> release, no hang in almost 2 weeks (the hangs happened within 1-2 days 
>>>>>>> after boot).
>>>>>>
>>>>>> Can you narrow down which firmware(s) cause the problem?
>>>>>
>>>>> I'll try, but note I'm not really sure yet my hangs were related to 
>>>>> firmware (only). Anyway, I'll try narrowing it down.
>>>>
>>>> Thanks.  Does this patch help?
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F433701%2F&amp;data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843352846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1BJky5Nl47A2ytThBe44pAJEHKEARozWTjskAdkK1s8%3D&amp;reserved=0
>>>
>>> Unfortunately not. After no hangs for two weeks with older firmware, I just 
>>> got a hang again within a day with newer firmware and a kernel with this 
>>> fix.
>>>
>>>
>>> I'll try and narrow down which firmware triggers it now. Does Picasso use 
>>> the picasso_*.bin ones only, or others as well?
>>
>> The picasso ones and raven_dmcu.bin.
>
> Thanks. raven_dmcu.bin hasn't changed, so I'm trying to bisect the 8 Picasso 
> ones which have changed:
>
> picasso_asd.bin
> picasso_ce.bin
> picasso_me.bin
> picasso_mec2.bin
> picasso_mec.bin
> picasso_pfp.bin
> picasso_sdma.bin
> picasso_vcn.bin

Things are pointing to picasso_sdma.bin. I'm currently running with only that 
one reverted to linux-firmware 20210315, and haven't got any hangs for a week.

Note that I've previously gone for a week without a hang even with firmware 
which had hung before. So there's still a small chance that I'm just on another 
lucky run.

That said, Pierre-Eric has also homed in on raven_sdma.bin for similar hangs, 
and reverting to older firmware seems to have helped multiple people on bug 
reports.

So, I think it makes sense for you guys to start looking for what could be 
going wrong with the Picasso/Raven SDMA firmware from 20.50. One thing I 
noticed is that the SDMA firmware from 20.50 advertises the same feature 
version, but a *lower* firmware version than the one from 18.50. So it might be 
worth double-checking that there wasn't an accidental downgrade to some older 
version.


--
Earthling Michel Dänzer               |               
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&amp;data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843352846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=a4DpKvRRhPfsEg82S8CWs%2FFORSeK22RPe1Grbbkd8qE%3D&amp;reserved=0
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843352846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=oa3XWhbFjxkpciPx%2BDDcni5fVnkVGGgeRe%2FQimF7vRo%3D&amp;reserved=0

Attachment: picasso_sdma.bin
Description: picasso_sdma.bin

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to