[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-05-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

--- Comment #15 from Alex Deucher  ---
(In reply to Eero Tamminen from comment #14)
> 
> After looking at kernel firmware repo, I wonder whether the problem is
> firmware:
>  
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
> 
> VegaM hasn't been updated since it has been added, almost a year ago:

It hasn't been updated because there have not been any changes internally.  I
always update all asic firmware when updates are available.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-05-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

Eero Tamminen  changed:

   What|Removed |Added

 Status|RESOLVED|VERIFIED

--- Comment #14 from Eero Tamminen  ---
(In reply to Samuel Pitoiset from comment #13)
> I asked for a copy of this benchmark (the Linux version) and they said me
> that it will no longer be supported, don't expect further releases of Aztec.
> Closing because we just can't debug it.

After looking at kernel firmware repo, I wonder whether the problem is
firmware:
  https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/

VegaM hasn't been updated since it has been added, almost a year ago:

$ git log --format=fuller vegam*
commit 153a51e438cafe07610b28db0304b1721b91d847
Author: Alex Deucher 
AuthorDate: Tue Jul 10 15:53:14 2018 -0500
Commit: Josh Boyer 
CommitDate: Tue Jul 17 07:54:55 2018 -0400

amdgpu: add initial VegaM firmware


Whereas other Vega firmware was updated this month:

$ git log -1 --format=fuller vega*
commit 92e17d0dd2437140fab044ae62baf69b35d7d1fa (HEAD -> master, origin/master,
origin/HEAD)
Author: Alex Deucher 
AuthorDate: Mon Apr 29 08:50:27 2019 -0500
Commit: Josh Boyer 
CommitDate: Thu May 2 06:24:19 2019 -0400

amdgpu: update vega20 to the latest 19.10 firmware


As was previous generation:

$ git log -1 --format=fuller polaris*
commit 4ea5c73b96ed4a508f90047e22ccbaa477481310
Author: Alex Deucher 
AuthorDate: Mon Apr 29 08:47:55 2019 -0500
Commit: Josh Boyer 
CommitDate: Thu May 2 06:23:54 2019 -0400

amdgpu: update polaris11 to the latest 19.10 firmware


Even two generations older cards have newer update:

$ git log -1 --format=fuller tonga_* fiji_* carrizo_* stoney_*
commit fcd5a5f14abf1c0202abb8dc6b98ddb2ff23c359
Author: Alex Deucher 
AuthorDate: Tue Oct 23 16:35:58 2018 -0500
Commit: Josh Boyer 
CommitDate: Fri Oct 26 08:08:40 2018 -0400

amdgpu: update fiji firmware to 18.40


-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-05-09 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

Samuel Pitoiset  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #13 from Samuel Pitoiset  ---
I asked for a copy of this benchmark (the Linux version) and they said me that
it will no longer be supported, don't expect further releases of Aztec.
Closing because we just can't debug it.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-03-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

--- Comment #12 from Eero Tamminen  ---
(In reply to Samuel Pitoiset from comment #11)
> First time I see a shader like that...
> 
> Can you install spirv-dis and generate a new hang report, please? The SPIR-V
> is probably useful too.

Sorry, shortly after filing this bug, I stopped having extra time for 3D bugs. 
I'll still file bugs on larger issues I notice, and can verify the fixed ones,
but not spend time investigating them (it would have helped if spirv-tools
would have been available in Ubuntu :-/).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-03-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

--- Comment #11 from Samuel Pitoiset  ---
First time I see a shader like that...

Can you install spirv-dis and generate a new hang report, please? The SPIR-V is
probably useful too.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-03-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

--- Comment #10 from Eero Tamminen  ---
Created attachment 143573
  --> https://bugs.freedesktop.org/attachment.cgi?id=143573=edit
70MB output from the RADV debug options (compressed)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-03-07 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

--- Comment #9 from Eero Tamminen  ---
Created attachment 143572
  --> https://bugs.freedesktop.org/attachment.cgi?id=143572=edit
Hang trace

(In reply to Samuel Pitoiset from comment #8)
> Again, without the demo is hard to fix.

While GfxBench v5 / AztecRuins seems still to be proprietary for Desktop Linux
(available for free only on Windows & Android), (recoverable) Manhattan hangs
in bug 108898 can be tested with the public GfxBench v4 version.


> Can you try 'export RADV_DEBUG=nodcc,nohiz,zerovram,nofastclears' ?
> 
> If it still hangs

Yes, it still hangs, just less verbosely.

dmesg:
[  546.116535] amdgpu :01:00.0: GPU fault detected: 146 0x0fa0880c for
process testfw_app pid 1859 thread testfw_app pid 1860
[  546.116538] amdgpu :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x001001F4
[  546.116539] amdgpu :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0808800C
[  546.116541] amdgpu :01:00.0: VM fault (0x0c, vmid 4, pasid 32772) at
page 1049076, read from 'TC4' (0x54433400) (136)
[  556.201073] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=11253, emitted seq=11254
[  556.201101] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process testfw_app pid 1859 thread testfw_app pid 1860
[  556.201104] amdgpu :01:00.0: GPU reset begin!
[  556.616910] cp is busy, skip halt cp
[  556.805398] rlc is busy, skip halt rlc
[  556.806410] amdgpu :01:00.0: GPU pci config reset
[  556.818925] amdgpu :01:00.0: GPU reset succeeded, trying to resume
[  556.818962] [drm] PCIE GART of 256M enabled (table at 0x00F4007E9000).
[  556.818991] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[  556.896623] [drm] UVD and UVD ENC initialized successfully.
[  556.997551] [drm] VCE initialized successfully.
[  557.007168] [drm] recover vram bo from shadow start
[  557.012867] [drm] recover vram bo from shadow done
[  557.012869] [drm] Skip scheduling IBs!
[  557.012956] amdgpu :01:00.0: GPU reset(2) succeeded!
[  557.013063] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
...

Application:
--
Warm up Generate SH shader...

Workgroup size: 8
compile deferred_irradiance_volumes/m_envprobe_generate_sh_compute.shader...
done

amdgpu: radv_amdgpu_cs_query_fence_status failed.
glVkError: 2 line: 4329 func: Finish
amdgpu: radv_amdgpu_cs_query_fence_status failed.
glVkError: 2 line: 4219 func: BeginCommandBuffer
amdgpu: The CS has been rejected, see dmesg for more information.
vk: error: failed to submit CS 0
--


> generating a hang report might help

> export RADV_TRACE_FILE=$HOME/hang.trace
> export RADV_DEBUG=allbos,vmfaults,zerovram,syncshaders

Hang trace attached.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-03-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

--- Comment #8 from Samuel Pitoiset  ---
Again, without the demo is hard to fix.
Can you try 'export RADV_DEBUG=nodcc,nohiz,zerovram,nofastclears' ?

If it still hangs, generating a hang report might help

export RADV_TRACE_FILE=$HOME/hang.trace
export RADV_DEBUG=allbos,vmfaults,zerovram,syncshaders

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-03-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900
Bug 108900 depends on bug 109920, which changed state.

Bug 109920 Summary: "NIR validation failed in internal shader" abort with all 
Vulkan test-cases
https://bugs.freedesktop.org/show_bug.cgi?id=109920

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [Bug 108900] [KBL-G][Vulkan] Non-recoverable GPU hangs with GfxBench v5 Aztec Ruins Vulkan test

2019-03-06 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108900

Eero Tamminen  changed:

   What|Removed |Added

 Depends on||109920
Summary|Non-recoverable GPU hangs   |[KBL-G][Vulkan]
   |with GfxBench v5 Aztec  |Non-recoverable GPU hangs
   |Ruins Vulkan test   |with GfxBench v5 Aztec
   ||Ruins Vulkan test

--- Comment #7 from Eero Tamminen  ---
Still hangs with latest drm-tip kernel v5.0 and Mesa git from yesterday:

[28648.228909] amdgpu :01:00.0: GPU fault detected: 146 0x0fa0880c for
process testfw_app pid 17622 thread testfw_app pid 17623
[28648.228911] amdgpu :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x001001F4
[28648.228912] amdgpu :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0A08800C
[28648.228914] amdgpu :01:00.0: VM fault (0x0c, vmid 5, pasid 32772) at
page 1049076, read from 'TC4' (0x54433400) (136)
[28648.228920] amdgpu :01:00.0: GPU fault detected: 146 0x0fa0840c for
process testfw_app pid 17622 thread testfw_app pid 17623
[28648.228921] amdgpu :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x001001FE
[28648.228922] amdgpu :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0A08800C
[28648.228923] amdgpu :01:00.0: VM fault (0x0c, vmid 5, pasid 32772) at
page 1049086, read from 'TC4' (0x54433400) (136)
[28648.229002] amdgpu :01:00.0: GPU fault detected: 146 0x0fb1880c for
process testfw_app pid 17622 thread testfw_app pid 17623
[28648.229003] amdgpu :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR  
0x001001F5
[28648.229004] amdgpu :01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS
0x0A18802C
[28648.229005] amdgpu :01:00.0: VM fault (0x2c, vmid 5, pasid 32772) at
page 1049077, read from 'TC0' (0x54433000) (392)
[28657.458604] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[28658.492490] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=164402, emitted seq=164404
[28658.492519] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process testfw_app pid 17622 thread testfw_app pid 17623
[28658.492521] amdgpu :01:00.0: GPU reset begin!
[28662.578695] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[28663.014035] cp is busy, skip halt cp
[28663.202474] rlc is busy, skip halt rlc
[28663.203486] amdgpu :01:00.0: GPU pci config reset
[28663.215357] amdgpu :01:00.0: GPU reset succeeded, trying to resume
[28663.215407] [drm] PCIE GART of 256M enabled (table at 0x00F4007E9000).
[28663.215444] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[28663.292813] [drm] UVD and UVD ENC initialized successfully.
[28663.393739] [drm] VCE initialized successfully.
[28663.403269] [drm] recover vram bo from shadow start
[28663.408650] [drm] recover vram bo from shadow done
[28663.408651] [drm] Skip scheduling IBs!
[28663.408652] [drm] Skip scheduling IBs!
[28663.408669] [drm] Skip scheduling IBs!
[28663.408709] amdgpu :01:00.0: GPU reset(2) succeeded!
[28663.408849] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
[28663.452225] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser -125!
...



Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=109920
[Bug 109920] "NIR validation failed in internal shader" abort with all Vulkan
test-cases
-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev