[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 --- Comment #15 from Alex Deucher --- (In reply to Eero Tamminen from comment #14) > > After looking at kernel firmware repo, I wonder whether the problem is > firmware: > > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/ > > VegaM hasn't been updated since it has been added, almost a year ago: It hasn't been updated because there have not been any changes internally. I always update all asic firmware when updates are available. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 Eero Tamminen changed: What|Removed |Added Status|RESOLVED|VERIFIED --- Comment #14 from Eero Tamminen --- (In reply to Samuel Pitoiset from comment #13) > I asked for a copy of this benchmark (the Linux version) and they said me > that it will no longer be supported, don't expect further releases of Aztec. > Closing because we just can't debug it. After looking at kernel firmware repo, I wonder whether the problem is firmware: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/ VegaM hasn't been updated since it has been added, almost a year ago: $ git log --format=fuller vegam* commit 153a51e438cafe07610b28db0304b1721b91d847 Author: Alex Deucher AuthorDate: Tue Jul 10 15:53:14 2018 -0500 Commit: Josh Boyer CommitDate: Tue Jul 17 07:54:55 2018 -0400 amdgpu: add initial VegaM firmware Whereas other Vega firmware was updated this month: $ git log -1 --format=fuller vega* commit 92e17d0dd2437140fab044ae62baf69b35d7d1fa (HEAD -> master, origin/master, origin/HEAD) Author: Alex Deucher AuthorDate: Mon Apr 29 08:50:27 2019 -0500 Commit: Josh Boyer CommitDate: Thu May 2 06:24:19 2019 -0400 amdgpu: update vega20 to the latest 19.10 firmware As was previous generation: $ git log -1 --format=fuller polaris* commit 4ea5c73b96ed4a508f90047e22ccbaa477481310 Author: Alex Deucher AuthorDate: Mon Apr 29 08:47:55 2019 -0500 Commit: Josh Boyer CommitDate: Thu May 2 06:23:54 2019 -0400 amdgpu: update polaris11 to the latest 19.10 firmware Even two generations older cards have newer update: $ git log -1 --format=fuller tonga_* fiji_* carrizo_* stoney_* commit fcd5a5f14abf1c0202abb8dc6b98ddb2ff23c359 Author: Alex Deucher AuthorDate: Tue Oct 23 16:35:58 2018 -0500 Commit: Josh Boyer CommitDate: Fri Oct 26 08:08:40 2018 -0400 amdgpu: update fiji firmware to 18.40 -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 Samuel Pitoiset changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |WONTFIX --- Comment #13 from Samuel Pitoiset --- I asked for a copy of this benchmark (the Linux version) and they said me that it will no longer be supported, don't expect further releases of Aztec. Closing because we just can't debug it. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 --- Comment #12 from Eero Tamminen --- (In reply to Samuel Pitoiset from comment #11) > First time I see a shader like that... > > Can you install spirv-dis and generate a new hang report, please? The SPIR-V > is probably useful too. Sorry, shortly after filing this bug, I stopped having extra time for 3D bugs. I'll still file bugs on larger issues I notice, and can verify the fixed ones, but not spend time investigating them (it would have helped if spirv-tools would have been available in Ubuntu :-/). -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 --- Comment #11 from Samuel Pitoiset --- First time I see a shader like that... Can you install spirv-dis and generate a new hang report, please? The SPIR-V is probably useful too. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 --- Comment #10 from Eero Tamminen --- Created attachment 143573 --> https://bugs.freedesktop.org/attachment.cgi?id=143573=edit 70MB output from the RADV debug options (compressed) -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 --- Comment #9 from Eero Tamminen --- Created attachment 143572 --> https://bugs.freedesktop.org/attachment.cgi?id=143572=edit Hang trace (In reply to Samuel Pitoiset from comment #8) > Again, without the demo is hard to fix. While GfxBench v5 / AztecRuins seems still to be proprietary for Desktop Linux (available for free only on Windows & Android), (recoverable) Manhattan hangs in bug 108898 can be tested with the public GfxBench v4 version. > Can you try 'export RADV_DEBUG=nodcc,nohiz,zerovram,nofastclears' ? > > If it still hangs Yes, it still hangs, just less verbosely. dmesg: [ 546.116535] amdgpu :01:00.0: GPU fault detected: 146 0x0fa0880c for process testfw_app pid 1859 thread testfw_app pid 1860 [ 546.116538] amdgpu :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001001F4 [ 546.116539] amdgpu :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0808800C [ 546.116541] amdgpu :01:00.0: VM fault (0x0c, vmid 4, pasid 32772) at page 1049076, read from 'TC4' (0x54433400) (136) [ 556.201073] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=11253, emitted seq=11254 [ 556.201101] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process testfw_app pid 1859 thread testfw_app pid 1860 [ 556.201104] amdgpu :01:00.0: GPU reset begin! [ 556.616910] cp is busy, skip halt cp [ 556.805398] rlc is busy, skip halt rlc [ 556.806410] amdgpu :01:00.0: GPU pci config reset [ 556.818925] amdgpu :01:00.0: GPU reset succeeded, trying to resume [ 556.818962] [drm] PCIE GART of 256M enabled (table at 0x00F4007E9000). [ 556.818991] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost! [ 556.896623] [drm] UVD and UVD ENC initialized successfully. [ 556.997551] [drm] VCE initialized successfully. [ 557.007168] [drm] recover vram bo from shadow start [ 557.012867] [drm] recover vram bo from shadow done [ 557.012869] [drm] Skip scheduling IBs! [ 557.012956] amdgpu :01:00.0: GPU reset(2) succeeded! [ 557.013063] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! ... Application: -- Warm up Generate SH shader... Workgroup size: 8 compile deferred_irradiance_volumes/m_envprobe_generate_sh_compute.shader... done amdgpu: radv_amdgpu_cs_query_fence_status failed. glVkError: 2 line: 4329 func: Finish amdgpu: radv_amdgpu_cs_query_fence_status failed. glVkError: 2 line: 4219 func: BeginCommandBuffer amdgpu: The CS has been rejected, see dmesg for more information. vk: error: failed to submit CS 0 -- > generating a hang report might help > export RADV_TRACE_FILE=$HOME/hang.trace > export RADV_DEBUG=allbos,vmfaults,zerovram,syncshaders Hang trace attached. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 --- Comment #8 from Samuel Pitoiset --- Again, without the demo is hard to fix. Can you try 'export RADV_DEBUG=nodcc,nohiz,zerovram,nofastclears' ? If it still hangs, generating a hang report might help export RADV_TRACE_FILE=$HOME/hang.trace export RADV_DEBUG=allbos,vmfaults,zerovram,syncshaders -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 Bug 108900 depends on bug 109920, which changed state. Bug 109920 Summary: "NIR validation failed in internal shader" abort with all Vulkan test-cases https://bugs.freedesktop.org/show_bug.cgi?id=109920 What|Removed |Added Status|NEW |RESOLVED Resolution|--- |DUPLICATE -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test
https://bugs.freedesktop.org/show_bug.cgi?id=108900 Eero Tamminen changed: What|Removed |Added Depends on||109920 Summary|Non-recoverable GPU hangs |[KBL-G][Vulkan] |with GfxBench v5 Aztec |Non-recoverable GPU hangs |Ruins Vulkan test |with GfxBench v5 Aztec ||Ruins Vulkan test --- Comment #7 from Eero Tamminen --- Still hangs with latest drm-tip kernel v5.0 and Mesa git from yesterday: [28648.228909] amdgpu :01:00.0: GPU fault detected: 146 0x0fa0880c for process testfw_app pid 17622 thread testfw_app pid 17623 [28648.228911] amdgpu :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001001F4 [28648.228912] amdgpu :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A08800C [28648.228914] amdgpu :01:00.0: VM fault (0x0c, vmid 5, pasid 32772) at page 1049076, read from 'TC4' (0x54433400) (136) [28648.228920] amdgpu :01:00.0: GPU fault detected: 146 0x0fa0840c for process testfw_app pid 17622 thread testfw_app pid 17623 [28648.228921] amdgpu :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001001FE [28648.228922] amdgpu :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A08800C [28648.228923] amdgpu :01:00.0: VM fault (0x0c, vmid 5, pasid 32772) at page 1049086, read from 'TC4' (0x54433400) (136) [28648.229002] amdgpu :01:00.0: GPU fault detected: 146 0x0fb1880c for process testfw_app pid 17622 thread testfw_app pid 17623 [28648.229003] amdgpu :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001001F5 [28648.229004] amdgpu :01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A18802C [28648.229005] amdgpu :01:00.0: VM fault (0x2c, vmid 5, pasid 32772) at page 1049077, read from 'TC0' (0x54433000) (392) [28657.458604] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out. [28658.492490] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=164402, emitted seq=164404 [28658.492519] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process testfw_app pid 17622 thread testfw_app pid 17623 [28658.492521] amdgpu :01:00.0: GPU reset begin! [28662.578695] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out. [28663.014035] cp is busy, skip halt cp [28663.202474] rlc is busy, skip halt rlc [28663.203486] amdgpu :01:00.0: GPU pci config reset [28663.215357] amdgpu :01:00.0: GPU reset succeeded, trying to resume [28663.215407] [drm] PCIE GART of 256M enabled (table at 0x00F4007E9000). [28663.215444] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost! [28663.292813] [drm] UVD and UVD ENC initialized successfully. [28663.393739] [drm] VCE initialized successfully. [28663.403269] [drm] recover vram bo from shadow start [28663.408650] [drm] recover vram bo from shadow done [28663.408651] [drm] Skip scheduling IBs! [28663.408652] [drm] Skip scheduling IBs! [28663.408669] [drm] Skip scheduling IBs! [28663.408709] amdgpu :01:00.0: GPU reset(2) succeeded! [28663.408849] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! [28663.452225] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125! ... Referenced Bugs: https://bugs.freedesktop.org/show_bug.cgi?id=109920 [Bug 109920] "NIR validation failed in internal shader" abort with all Vulkan test-cases -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev