On Mon, 2017-06-26 at 14:14 +0200, Joerg Roedel wrote:
> On Fri, Jun 23, 2017 at 10:20:47AM -0400, Jan Vesely wrote:
> > I was able to trigger "Completion-Wait loop timed out" messages in the
> > following situation:
> > Hung OpenCL task running on dGPU.
> > dGPU goes to sleep.
> > sigterm to hung task.
> > it seems to recover OK after the dGPU is powered back on
> 
> How does that 'dGPU goes to sleep' work? Do you put it to sleep manually
> via sysfs or something? Or is that something that amdgpu does on its
> own?

AMD folks should be able to provide more details. afaik, the driver
uses ACPI methods to power on/off the device. Driver routines wake the
device up before accessing it and there is a timeout to turn it off
after few seconds of inactivity.

> 
> It looks like the GPU just switches the ATS unit off when it goes to
> sleep and doesn't answer the invalidation anymore, which explains the
> completion-wait timeouts.

Both MMIO regs and PCIe config regs are turned off so it would not
surprise me if all PCIe requests were ignored by the device in off
state. it should be possible to request device wake up before
invalidating the relevant IOMMU domain. I'll leave to more
knowledgeable ppl to judge whether it's a good idea (we can also
postpone such invalidations until the device is woken by other means)


Jan

> 
> 
> 
>       Joerg
> 

-- 
Jan Vesely <[email protected]>

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to