On 11/18/25 20:35, Jason Gunthorpe wrote:
On Tue, Nov 18, 2025 at 07:29:22PM +0800, Baolu Lu wrote:
On 11/18/2025 3:47 PM, Tian, Kevin wrote:
From: Baolu Lu <[email protected]>
Sent: Tuesday, November 18, 2025 2:24 PM
On 11/18/25 12:04, Tian, Kevin wrote:
46 bits is not particularly big... Hmm, I wonder if we have some issue
with the sign-extend? iommupt does that properly and IIRC the old code
did not. Which of the page table formats is this using second stage or
first stage?
Assume it's first stage for kernel IOVA, if available in hw
It's the first stage (x86_64 fmt) according to the PASID entry setup:
IOMMU dmar0: Root Table Address: 0x105a82000
B.D.F Root_entry Context_entry
PASID PASID_table_entry
00:02.0 0x0000000000000000:0x0000000105a85001
0x0000000000000000:0x0000000105a84405 0
0x0000000105a86000:0x0000000000000002:0x0000000000000049
so the 3rd experiment (if the former two doesn't show difference) is
to force using second stage to see whether it's caused by the
sign-extend logic.
I hardcoded the driver to always use the second stage for paging domain
translation, and it works now.
IOMMU dmar0: Root Table Address: 0x1049b6000
B.D.F Root_entry Context_entry
PASID PASID_table_entry
00:02.0 0x0000000000000000:0x00000001049ba001
0x0000000000000000:0x00000001049b9405 0
0x0000000000000000:0x0000000000000002:0x00000001049bb089
Okay, that is a great finding!
So either it is something about the sign extend or something about
x86_64. Given the similarity of vtdss all the code around cache/iotlb
flushing is the same so we can say that is working.
1) Can you run the test with CONFIG_DEBUG_GENERIC_PT=y? Lets see if
pt_check_install_leaf_args() fails?
No. It doesn't trigger any PT_WARN_ON() in pt_check_install_leaf_args().
2) Lets try to disabling the sign extend function:
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2818,8 +2818,7 @@ intel_iommu_domain_alloc_first_stage(struct device *dev,
else
cfg.common.hw_max_vasz_lg2 = 48;
cfg.common.hw_max_oasz_lg2 = 52;
- cfg.common.features = BIT(PT_FEAT_SIGN_EXTEND) |
- BIT(PT_FEAT_FLUSH_RANGE);
+ cfg.common.features = BIT(PT_FEAT_FLUSH_RANGE);
/* First stage always uses scalable mode */
if (!ecap_smpwc(iommu->ecap))
cfg.common.features |= BIT(PT_FEAT_DMA_INCOHERENT);
This doesn't make any difference.
3) Let's validate the mapping:
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2572,6 +2572,21 @@ int iommu_map_nosync(struct iommu_domain *domain,
unsigned long iova,
else
trace_map(orig_iova, orig_paddr, orig_size);
+ if (!ret) {
+ paddr = orig_paddr;
+ for (iova = orig_iova; iova < orig_iova + orig_size; iova +=
PAGE_SIZE) {
+ phys_addr_t pt_paddr = ops->iova_to_phys(domain, iova);
+
+ if (pt_paddr != paddr) {
+ pr_warn("mapping: Bad physical storage %lx != %lx at
%lx\n",
+ (unsigned long)paddr,
+ (unsigned long)pt_paddr, iova);
+ break;
+ }
+ paddr += PAGE_SIZE;
+ }
+ }
+
Maybe the physical is getting truncated for some reason?
The pr_warn() in above code hasn't been triggered.
4) Please collect the map/unmap traces, including the return code
I only run a typical test case named gem_exec_gttfill. The trace provide
by Chaitanya is more reasonable.
Thanks,
baolu