On Mon, 2 May 2022 16:56:20 +0100 Morris Zuss <[email protected]> wrote: > > *ERROR* sw_init of IP block <gfx_v8_0> failed -2 > > One thing I did notice is that the amdgpu is being loaded now > according to lspci -k. Although I cannot seem to unload the amdgpu > module and when I try to shutdown it just hangs indefinitely unless I > force it off. Do you see something on the display?
> After compiling again, the dmesg output was pretty much the same: > https://termbin.com/x5kx > amdgpu 0000:07:00.0: amdgpu: gfx8: > Failed to load firmware "/*(DEBLOBBED)*/" > [drm:gfx_v8_0_sw_init.cold [amdgpu]] > *ERROR* Failed to load gfx firmware! These two prints are due to: > r = gfx_v8_0_init_microcode(adev); > if (r) { > DRM_ERROR("Failed to load gfx firmware!\n"); > return r; > } in gfx_v8_0_sw_init in amdgpu/gfx_v8_0.c. And here it seems that the return r wasn't patched correctly for some reason. If you somehow modified the Parabola kernel, can you look at the source of gfx_v8_0.c to verify that the return r has been replaced by a comment somehow? Several things seems to indicate that: - The -2 looks like a failed firmware load since -2 is -ENOENT (No such file or directory). - If another firmware would fail it would print something after the "Failed to load gfx firmware!" and before the "*ERROR* sw_init of IP block <gfx_v8_0> failed -2" - I found no way the code could do this return -2 if it didn't fail there, though there was a lot of code to read so I could have missed it as well[1]. Another issue could be that you are not running the kernel that corresponds to the source you patched. That happened to me very often and it lead to a lot of time spent on impossible debug sessions so I try to find ways to match the binary being built and the log from the running kernel. A way to do that in your case would be to look at when the kernel image was last created (with ls -l) and compare that with the time at which the kernel used to produce the logs was built, which we can see with dmesg, at the beginning of the log. Here's an example: > [ 0.000000] Linux version 5.15.12-gnu-1-pae > (linux-libre-pae@parabola) (gcc (GCC) 11.1.0, GNU ld (GNU Binutils) > 2.36.1) #1 SMP PREEMPT Mon, 10 Jan 2022 01:47:28 +0000 Here apparently that kernel was built the 10 January 2022. Note that GNU/Linux distributions tend to disable that feature as having the build time in the binaries prevents reproducible builds, but in our case here it is very useful, and it seems that Parabola keeps such information, so we should be good here. References: ----------- [1]For the record there is a more detailed analysis of the code that shows that -2 probably comes from gfx_v8_0_init_microcode: That print: > [drm:amdgpu_device_init.cold [amdgpu]] > *ERROR* sw_init of IP block <gfx_v8_0> failed -2 comes from that code: > r = adev->ip_blocks[i].version->funcs->sw_init((void *)adev); > if (r) { > DRM_ERROR("sw_init of IP block <%s> failed %d\n", > adev->ip_blocks[i].version->funcs->name, r); > goto init_failed; > } And that .sw_init function is gfx_v8_0_sw_init (it's similar to the .hw_init function we had before). So it means that, if your patching works, it probably would have failed after it and before the end of the gfx_v8_0_sw_init function, and without printing anything in between. If we remove the parts that print right before returning an error, we're left with the following code: > r = amdgpu_ring_init(adev, ring, 1024, &adev->gfx.eop_irq, > AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP, > AMDGPU_RING_PRIO_DEFAULT); > if (r) > return r; This either prints or doesn't return -2 / -ENOENT (No such file or directory), so we can rule that part out. > r = gfx_v8_0_compute_ring_init(adev, ring_id, i, k, j); > if (r) > return r; This also can't fail here as it only returns error from amdgpu_ring_init which doesn't return -2 without printing. > r = amdgpu_gfx_kiq_init_ring(adev, &kiq->ring, &kiq->irq); > if (r) > return r; This should only return 0 or -22 / -EINVAL or print errors before returning, so we can rule that out too. > r = amdgpu_gfx_mqd_sw_init(adev, sizeof(struct vi_mqd_allocation)); > if (r) > return r; This should either return 0 or print before returning errors, so it's not that either. And with: > r = gfx_v8_0_gpu_early_init(adev); > if (r) > return r; We only have this code that is interesting: > case CHIP_POLARIS11: > case CHIP_POLARIS12: > ret = amdgpu_atombios_get_gfx_info(adev); > if (ret) > return ret; > [...] > break; And that can only return -22 (-EINVAL) or 0. Denis.
pgpjtgTT5EXjk.pgp
Description: OpenPGP digital signature
_______________________________________________ linux-libre mailing list [email protected] http://www.fsfla.org/cgi-bin/mailman/listinfo/linux-libre
