On Tue, Nov 18, 2025 at 07:29:22PM +0800, Baolu Lu wrote:
> On 11/18/2025 3:47 PM, Tian, Kevin wrote:
> > > From: Baolu Lu <[email protected]>
> > > Sent: Tuesday, November 18, 2025 2:24 PM
> > >
> > > On 11/18/25 12:04, Tian, Kevin wrote:
> > > > > 46 bits is not particularly big... Hmm, I wonder if we have some issue
> > > > > with the sign-extend? iommupt does that properly and IIRC the old code
> > > > > did not. Which of the page table formats is this using second stage or
> > > > > first stage?
> > > > Assume it's first stage for kernel IOVA, if available in hw
> > >
> > > It's the first stage (x86_64 fmt) according to the PASID entry setup:
> > >
> > > IOMMU dmar0: Root Table Address: 0x105a82000
> > > B.D.F Root_entry Context_entry
> > > PASID PASID_table_entry
> > > 00:02.0 0x0000000000000000:0x0000000105a85001
> > > 0x0000000000000000:0x0000000105a84405 0
> > > 0x0000000105a86000:0x0000000000000002:0x0000000000000049
> > >
> >
> > so the 3rd experiment (if the former two doesn't show difference) is
> > to force using second stage to see whether it's caused by the
> > sign-extend logic.
>
> I hardcoded the driver to always use the second stage for paging domain
> translation, and it works now.
>
> IOMMU dmar0: Root Table Address: 0x1049b6000
> B.D.F Root_entry Context_entry
> PASID PASID_table_entry
> 00:02.0 0x0000000000000000:0x00000001049ba001
> 0x0000000000000000:0x00000001049b9405 0
> 0x0000000000000000:0x0000000000000002:0x00000001049bb089
Okay, that is a great finding!
So either it is something about the sign extend or something about
x86_64. Given the similarity of vtdss all the code around cache/iotlb
flushing is the same so we can say that is working.
1) Can you run the test with CONFIG_DEBUG_GENERIC_PT=y? Lets see if
pt_check_install_leaf_args() fails?
2) Lets try to disabling the sign extend function:
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2818,8 +2818,7 @@ intel_iommu_domain_alloc_first_stage(struct device *dev,
else
cfg.common.hw_max_vasz_lg2 = 48;
cfg.common.hw_max_oasz_lg2 = 52;
- cfg.common.features = BIT(PT_FEAT_SIGN_EXTEND) |
- BIT(PT_FEAT_FLUSH_RANGE);
+ cfg.common.features = BIT(PT_FEAT_FLUSH_RANGE);
/* First stage always uses scalable mode */
if (!ecap_smpwc(iommu->ecap))
cfg.common.features |= BIT(PT_FEAT_DMA_INCOHERENT);
3) Let's validate the mapping:
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2572,6 +2572,21 @@ int iommu_map_nosync(struct iommu_domain *domain,
unsigned long iova,
else
trace_map(orig_iova, orig_paddr, orig_size);
+ if (!ret) {
+ paddr = orig_paddr;
+ for (iova = orig_iova; iova < orig_iova + orig_size; iova +=
PAGE_SIZE) {
+ phys_addr_t pt_paddr = ops->iova_to_phys(domain, iova);
+
+ if (pt_paddr != paddr) {
+ pr_warn("mapping: Bad physical storage %lx !=
%lx at %lx\n",
+ (unsigned long)paddr,
+ (unsigned long)pt_paddr, iova);
+ break;
+ }
+ paddr += PAGE_SIZE;
+ }
+ }
+
Maybe the physical is getting truncated for some reason?
4) Please collect the map/unmap traces, including the return code
Jason