On Mon, Nov 10, 2025 at 12:06:30PM +0530, Borah, Chaitanya Kumar wrote: > Hello Jason, > > Hope you are doing well. I am Chaitanya from the linux graphics team in > Intel. > > This mail is regarding a regression we are seeing in our CI runs[1] on > linux-next repository. > > Since the version next-20251106 [2], we are seeing our tests timing out > presumably caused by a GPU Hang.
Thank you for reporting this. I don't have any immediate theory, so I think it will need some debug. Maybe Kevin or Lu have some idea? Some general thoughts to check 1) Is there an iommu fault report? I did not see one in your dmesg, but maybe it was truncated? It is more puzzling to see an iommu related error and not see a fault report.. 2) Could it be one of the special iommu behaviors to support iGPU that is not working? Maybe we missed one? 3) I seem to recall Lu tested the coherent cache flushing, but that would also be a good question, is this iGPU cache incoherent with the CPU? Could this be a cache flushing bug? It is very hard to test that so it would not be such a surprise if it has a bug.. 4) Nobody has reported any other problems so far, so I'm inclined to think the map/unmap is working - but maybe there is some edge case the gpu driver is tripping up on? The lack of a fault report is very puzzling, even if it was #3 I would think a fault would be the most likely outcome of missing flushing.. The lack of a fault report suggests the wrong physical address was mapped as present which points to #4. Can you investigate a bit further and maybe see if we can get a bit more detail what that GPU thinks went wrong? Jason
