Joerg, I did run with 5.9.3. After about 2 hours in a reboot-cycle the system failed again with amdgpu-problems.
> please try booting with "pci=noats" on the kernel command line. This I will do next. Best regards, Edgar -----Original Message----- From: Merger, Edgar [AUTOSOL/MAS/AUGS] Sent: Mittwoch, 4. November 2020 15:36 To: '[email protected]' <[email protected]> Cc: '[email protected]' <[email protected]> Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled Joerg, One remark: > However I found out that with Kernel 5.9.3 the amdgpu kernel module is > not loaded/installed That is likely my fault because I was compiling that linux kernel on a faster machine (V1807B CPU against R1305G CPU (target)). I restarted that compile just now on the target machine to avoid any problems. Best regards, Edgar -----Original Message----- From: Merger, Edgar [AUTOSOL/MAS/AUGS] Sent: Mittwoch, 4. November 2020 15:19 To: [email protected] Cc: [email protected] Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > Yes, but it could be the same underlying reason. There is no PCI setup issue that we are aware of. > For a first try, use 5.9.3. If it reproduces there, please try booting with > "pci=noats" on the kernel command line. Did compile the kernel 5.9.3 and started a reboot test to see if it is going to fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module is not loaded/installed. So this way I don´t see it makes sense for further investigation. I might did something wrong when compiling the linux kernel 5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration of the kernel 5.9.3. However I do not know why it did not install amdgpu. > Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine > where this happens. For comparison I attached the logs when using 5.4.0-47 and 5.9.3. Best regards, Edgar -----Original Message----- From: [email protected] <[email protected]> Sent: Mittwoch, 4. November 2020 11:15 To: Merger, Edgar [AUTOSOL/MAS/AUGS] <[email protected]> Cc: [email protected] Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled On Wed, Nov 04, 2020 at 09:21:35AM +0000, Merger, Edgar [AUTOSOL/MAS/AUGS] wrote: > AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error > is at [ 52.772273], hence much earlier. Yes, but it could be the same underlying reason. > Have not tried to use an upstream kernel yet. Which one would you recommend? For a first try, use 5.9.3. If it reproduces there, please try booting with "pci=noats" on the kernel command line. Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine where this happens. Regards, Joerg > > As far as inconsistencies in the PCI-setup is concerned, the only thing that > I know of right now is that we haven´t entered a PCI subsystem vendor and > device ID yet. It is still "Advanced Micro Devices". We will change that soon > to "General Electric" or "Emerson". > > Best regards, > Edgar > > -----Original Message----- > From: [email protected] <[email protected]> > Sent: Mittwoch, 4. November 2020 09:53 > To: Merger, Edgar [AUTOSOL/MAS/AUGS] <[email protected]> > Cc: [email protected] > Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled > > Hi Edgar, > > On Fri, Oct 30, 2020 at 02:26:23PM +0000, Merger, Edgar [AUTOSOL/MAS/AUGS] > wrote: > > With one board we have a boot-problem that is reproducible at every ~50 > > boot. > > The system is accessible via ssh and works fine except for the > > Graphics. The graphics is off. We don´t see a screen. Please see > > attached “dmesg.log”. From [52.772273] onwards the kernel reports > > drm/amdgpu errors. It even tries to reset the GPU but that fails too. > > I tried to reset amdgpu also by command “sudo cat > > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either. > > Can you reproduce the problem with an upstream kernel too? > > These messages in dmesg indicate some problem in the platform setup: > > AMD-Vi: Completion-Wait loop timed out > > Might there be some inconsistencies in the PCI setup between the bridges and > the endpoints or something? > > Regards, > > Joerg
<<attachment: 5.9.3-fail.zip>>
_______________________________________________ iommu mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/iommu
