> From: Jason Gunthorpe <[email protected]>
> Sent: Tuesday, November 18, 2025 9:30 AM
>
> Also, I'd like to know if this is happening 100% reproducibly or of it
> is flakey.. Also this is 68 after boot and right at the first test,
> and just to confirm this is 68s after boot and right after
> starting a test so it looks like the test is just not working at all?
Looks there are other tests succeeded before this one, e.g. the
1st SUCCESS was much earlier:
<7>[ 44.799575] [IGT] core_auth: finished subtest basic-auth, SUCCESS
and i915 module was loaded manually at 40s after boot:
<7>[ 40.492621] [IGT] i915_module_load: executing
>
> I'm still interested to know if there is an iommu error that is
> somehow not getting into the log?
Just re-checked the code:
dmar_fault() -> dmar_fault_do_one() -> dmar_fault_dump_ptes():
pr_info("Dump %s table entries for IOVA 0x%llx\n", iommu->name, addr);
before reaching that point, there are several early exits:
- advanced fault log, which I don't think will happen (Baolu?)
- rate-limiting (now we didn't see any error)
- irq-remapping fault (get its own print)
>
> Finally, it is interesting that this test prints this:
>
> <5>[ 68.824598] i915 0000:00:02.0: Using 46-bit DMA addresses
>
> Which comes from here:
>
> if (dma_limit > DMA_BIT_MASK(32) && dev->iommu-
> >pci_32bit_workaround) {
> iova = alloc_iova_fast(iovad, iova_len,
> DMA_BIT_MASK(32) >> shift, false);
> if (iova)
> goto done;
>
> dev->iommu->pci_32bit_workaround = false;
> dev_notice(dev, "Using %d-bit DMA addresses\n",
> bits_per(dma_limit));
> }
>
> Which means dma-iommu has exceeded the 32 bit pool and is allocating
> high addresses now?
>
> It prints that and then immediately fails? Seems like a clue!
yes, that's something to be further studied.
According to the log:
<7>[ 40.991058] i915 0000:00:02.0: [drm:i915_ggtt_probe_hw [i915]] GGTT size
= 4096M
If the test gem_exec_gttfill tries to fill the entire 4G GGTT (device MMU),
it may need to allocate 4G system memory and map them into IOVA.
Then the above path could be triggered.
the CT message buffer is allocated at driver initialization time. Ideally
the IOVA mapping to that buffer shouldn't be affected by the new
map/unmap caused by gttfill, but...
>
> Is there a failing map call perhaps due to the driver setting up the
> wrong iova range for the table? iommpt is strict about enforcing the
> IOVA limitation. A failing map call might product this outcome (though
> I expect a iommu error log)
>
> The map traces only log on success though, so please add a print on
> failure too..
... yeah it'd be helpful if we can know a more accurate error from the
test side.
and we could add some trace to see whether it's the 1st 48-bit IOVA
range being allocated.
>
> 46 bits is not particularly big... Hmm, I wonder if we have some issue
> with the sign-extend? iommupt does that properly and IIRC the old code
> did not. Which of the page table formats is this using second stage or
> first stage?
Assume it's first stage for kernel IOVA, if available in hw