On Mon, Nov 10, 2025 at 12:06:30PM +0530, Borah, Chaitanya Kumar wrote:
> Hello Jason,
> 
> Hope you are doing well. I am Chaitanya from the linux graphics team in
> Intel.
> 
> This mail is regarding a regression we are seeing in our CI runs[1] on
> linux-next repository.
> 
> Since the version next-20251106 [2], we are seeing our tests timing out
> presumably caused by a GPU Hang.

Thank you for reporting this.

I don't have any immediate theory, so I think it will need some
debug. Maybe Kevin or Lu have some idea?

Some general thoughts to check

1) Is there an iommu fault report? I did not see one in your dmesg,
   but maybe it was truncated? It is more puzzling to see an iommu
   related error and not see a fault report..

2) Could it be one of the special iommu behaviors to support iGPU that
   is not working? Maybe we missed one?

3) I seem to recall Lu tested the coherent cache flushing, but that
   would also be a good question, is this iGPU cache incoherent with
   the CPU? Could this be a cache flushing bug? It is very hard to
   test that so it would not be such a surprise if it has a bug..

4) Nobody has reported any other problems so far, so I'm inclined to
   think the map/unmap is working - but maybe there is some edge case
   the gpu driver is tripping up on?

The lack of a fault report is very puzzling, even if it was #3 I would
think a fault would be the most likely outcome of missing
flushing.. The lack of a fault report suggests the wrong physical
address was mapped as present which points to #4.

Can you investigate a bit further and maybe see if we can get a bit
more detail what that GPU thinks went wrong?

Jason

Reply via email to