Joerg,

One remark:
With kernel-parameter pci=noats in dmesg there is

[   10.128463] kfd kfd: Error initializing iommuv2

Best regards,
Edgar

-----Original Message-----
From: Merger, Edgar [AUTOSOL/MAS/AUGS] 
Sent: Donnerstag, 5. November 2020 12:16
To: '[email protected]' <[email protected]>
Cc: '[email protected]' <[email protected]>
Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

Joerg,

I did run with 5.9.3. After about 2 hours in a reboot-cycle the system failed 
again with amdgpu-problems.

> please try booting with "pci=noats" on the kernel command line.
This I will do next.

Best regards,
Edgar

-----Original Message-----
From: Merger, Edgar [AUTOSOL/MAS/AUGS]
Sent: Mittwoch, 4. November 2020 15:36
To: '[email protected]' <[email protected]>
Cc: '[email protected]' <[email protected]>
Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

Joerg,

One remark: 
> However I found out that with Kernel 5.9.3 the amdgpu kernel module is 
> not loaded/installed
That is likely my fault because I was compiling that linux kernel on a faster 
machine (V1807B CPU against R1305G CPU (target)). I restarted that compile just 
now on the target machine to avoid any problems.

Best regards,
Edgar

-----Original Message-----
From: Merger, Edgar [AUTOSOL/MAS/AUGS]
Sent: Mittwoch, 4. November 2020 15:19
To: [email protected]
Cc: [email protected]
Subject: RE: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

> Yes, but it could be the same underlying reason.
There is no PCI setup issue that we are aware of.

> For a first try, use 5.9.3. If it reproduces there, please try booting with 
> "pci=noats" on the kernel command line.
Did compile the kernel 5.9.3 and started a reboot test to see if it is going to 
fail again. However I found out that with Kernel 5.9.3 the amdgpu kernel module 
is not loaded/installed. So this way I don´t see it makes sense for further 
investigation. I might did something wrong when compiling the linux kernel 
5.9.3. I did reuse my .config file that I used with 5.4.0-47 for configuration 
of the kernel 5.9.3. However I do not know why it did not install amdgpu.

> Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
> where this happens.
For comparison I attached the logs when using 5.4.0-47 and 5.9.3. 

Best regards,
Edgar

-----Original Message-----
From: [email protected] <[email protected]>
Sent: Mittwoch, 4. November 2020 11:15
To: Merger, Edgar [AUTOSOL/MAS/AUGS] <[email protected]>
Cc: [email protected]
Subject: Re: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled

On Wed, Nov 04, 2020 at 09:21:35AM +0000, Merger, Edgar [AUTOSOL/MAS/AUGS] 
wrote:
> AMD-Vi: Completion-Wait loop timed out is at [65499.964105] but amdgpu-error 
> is at [   52.772273], hence much earlier.

Yes, but it could be the same underlying reason.

> Have not tried to use an upstream kernel yet. Which one would you recommend?

For a first try, use 5.9.3. If it reproduces there, please try booting with 
"pci=noats" on the kernel command line.

Please also send me the output of 'lspci -vvv' and 'lspci -t' of the machine 
where this happens.

Regards,

        Joerg


> 
> As far as inconsistencies in the PCI-setup is concerned, the only thing that 
> I know of right now is that we haven´t entered a PCI subsystem vendor and 
> device ID yet. It is still "Advanced Micro Devices". We will change that soon 
> to "General Electric" or "Emerson".
> 
> Best regards,
> Edgar
> 
> -----Original Message-----
> From: [email protected] <[email protected]>
> Sent: Mittwoch, 4. November 2020 09:53
> To: Merger, Edgar [AUTOSOL/MAS/AUGS] <[email protected]>
> Cc: [email protected]
> Subject: [EXTERNAL] Re: amdgpu error whenever IOMMU is enabled
> 
> Hi Edgar,
> 
> On Fri, Oct 30, 2020 at 02:26:23PM +0000, Merger, Edgar [AUTOSOL/MAS/AUGS] 
> wrote:
> > With one board we have a boot-problem that is reproducible at every ~50 
> > boot.
> > The system is accessible via ssh and works fine except for the 
> > Graphics. The graphics is off. We don´t see a screen. Please see 
> > attached “dmesg.log”. From [52.772273] onwards the kernel reports 
> > drm/amdgpu errors. It even tries to reset the GPU but that fails too.
> > I tried to reset amdgpu also by command “sudo cat 
> > /sys/kernel/debug/dri/N/amdgpu_gpu_recover”. That did not help either.
> 
> Can you reproduce the problem with an upstream kernel too?
> 
> These messages in dmesg indicate some problem in the platform setup:
> 
>       AMD-Vi: Completion-Wait loop timed out
> 
> Might there be some inconsistencies in the PCI setup between the bridges and 
> the endpoints or something?
> 
> Regards,
> 
>       Joerg

Attachment: dmesg_pci_noats.log
Description: dmesg_pci_noats.log

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to