[Public] Thanks for narrowing this down. There is new PCO SDMA firmware available (attached). Can you try it?
Thanks, Alex ________________________________ From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> on behalf of Michel Dänzer <mic...@daenzer.net> Sent: Thursday, June 24, 2021 6:51 AM To: Alex Deucher <alexdeuc...@gmail.com> Cc: xgqt <x...@riseup.net>; amd-gfx list <amd-gfx@lists.freedesktop.org> Subject: Re: AMDGPU error: "[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!" On 2021-06-04 3:08 p.m., Michel Dänzer wrote: > On 2021-06-04 2:33 p.m., Alex Deucher wrote: >> On Fri, Jun 4, 2021 at 3:47 AM Michel Dänzer <mic...@daenzer.net> wrote: >>> >>> On 2021-05-19 3:57 p.m., Alex Deucher wrote: >>>> On Wed, May 19, 2021 at 4:48 AM Michel Dänzer <mic...@daenzer.net> wrote: >>>>> >>>>> On 2021-05-19 12:05 a.m., Alex Deucher wrote: >>>>>> On Tue, May 18, 2021 at 10:11 AM Michel Dänzer <mic...@daenzer.net> >>>>>> wrote: >>>>>>> >>>>>>> On 2021-05-17 11:33 a.m., xgqt wrote: >>>>>>>> Hello! >>>>>>>> >>>>>>>> I run a AMD laptop "81NC Lenovo IdeaPad S340-15API" - AMD Ryzen 5 >>>>>>>> 3500U with Radeon Vega 8 Graphics. >>>>>>>> Recently some breakages started happening for me. In about 1h after >>>>>>>> boot-up while using a KDE desktop machine GUI would freeze. Sometimes >>>>>>>> it would be possible to move the mouse but the rest will be frozen. >>>>>>>> Screen may start blinking or go black. >>>>>>>> >>>>>>>> I'm not sure if this is my kernel, firmware or the hardware. >>>>>>>> I don't understands dmesg that's why I'm guessing, but I think it is >>>>>>>> the firmware since this behavior started around 2021-05-15. >>>>>>>> From my Portage logs I see that I updated my firmware on 2021-05-14 at >>>>>>>> 18:16:06. >>>>>>>> So breakages started with my kernel: 5.10.27 and FW: 20210511. >>>>>>>> After breakage I jumped to a older kernel 5.4.97 and compiled 5.12.4. >>>>>>>> I didn't notice a breakage on 5.4.97 but system ran ~40 minutes. >>>>>>>> So I booted to newly compiled 5.12.4 where I was ~1h and it broke. >>>>>>>> After that I booted to 5.4.97 again and downgraded my FW. >>>>>>>> While I'm writing this I'm booted to kernel: 5.12.4 with FW: 20210315. >>>>>>>> >>>>>>>> I also described my situation on the Gentoo bugzilla: >>>>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.gentoo.org%2F790566&data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843342891%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=5HKZUabvEZWI%2BzQUBBPWl3Cpiy7Zjs%2BqaKa4XZyNK1g%3D&reserved=0 >>>>>>>> >>>>>>>> "dmesg.log" attached here is from the time machine run fine (at the >>>>>>>> moment); "errors_sat_may_15_072825_pm_cest_2021.log" is a dmesg log >>>>>>>> from the time system broke >>>>>>>> >>>>>>>> Can I get any help with this? What are the next steps I should take? >>>>>>>> Any other files I should provide? >>>>>>> >>>>>>> I've hit similar hangs with a Lenovo ThinkPad E595 (Ryzen 7 3700U / >>>>>>> Picasso / RAVEN 0x1002:0x15D8 0x17AA:0x5124 0xC1). I'm also suspecting >>>>>>> them to be firware related. The hangs occurred with firmware from the >>>>>>> AMD 20.50 release. I'm currently running with firmware from the 20.40 >>>>>>> release, no hang in almost 2 weeks (the hangs happened within 1-2 days >>>>>>> after boot). >>>>>> >>>>>> Can you narrow down which firmware(s) cause the problem? >>>>> >>>>> I'll try, but note I'm not really sure yet my hangs were related to >>>>> firmware (only). Anyway, I'll try narrowing it down. >>>> >>>> Thanks. Does this patch help? >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F433701%2F&data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843352846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=1BJky5Nl47A2ytThBe44pAJEHKEARozWTjskAdkK1s8%3D&reserved=0 >>> >>> Unfortunately not. After no hangs for two weeks with older firmware, I just >>> got a hang again within a day with newer firmware and a kernel with this >>> fix. >>> >>> >>> I'll try and narrow down which firmware triggers it now. Does Picasso use >>> the picasso_*.bin ones only, or others as well? >> >> The picasso ones and raven_dmcu.bin. > > Thanks. raven_dmcu.bin hasn't changed, so I'm trying to bisect the 8 Picasso > ones which have changed: > > picasso_asd.bin > picasso_ce.bin > picasso_me.bin > picasso_mec2.bin > picasso_mec.bin > picasso_pfp.bin > picasso_sdma.bin > picasso_vcn.bin Things are pointing to picasso_sdma.bin. I'm currently running with only that one reverted to linux-firmware 20210315, and haven't got any hangs for a week. Note that I've previously gone for a week without a hang even with firmware which had hung before. So there's still a small chance that I'm just on another lucky run. That said, Pierre-Eric has also homed in on raven_sdma.bin for similar hangs, and reverting to older firmware seems to have helped multiple people on bug reports. So, I think it makes sense for you guys to start looking for what could be going wrong with the Picasso/Raven SDMA firmware from 20.50. One thing I noticed is that the SDMA firmware from 20.50 advertises the same feature version, but a *lower* firmware version than the one from 18.50. So it might be worth double-checking that there wasn't an accidental downgrade to some older version. -- Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843352846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=a4DpKvRRhPfsEg82S8CWs%2FFORSeK22RPe1Grbbkd8qE%3D&reserved=0 Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7Calexander.deucher%40amd.com%7C06c9a5296ad74b0cd02408d936fe00bc%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637601286843352846%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=oa3XWhbFjxkpciPx%2BDDcni5fVnkVGGgeRe%2FQimF7vRo%3D&reserved=0
picasso_sdma.bin
Description: picasso_sdma.bin
_______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx