> From: Joao Martins <[email protected]>
> Sent: Friday, April 29, 2022 5:10 AM
> 
> IOMMU advertises Access/Dirty bits if the extended capability
> DMAR register reports it (ECAP, mnemonic ECAP.SSADS). The first
> stage table, though, has not bit for advertising, unless referenced via

first-stage is compatible to CPU page table thus a/d bit support is
implied. But for dirty tracking I'm I'm fine with only supporting it
with second-stage as first-stage will be used only for guest in the
nesting case (though in concept first-stage could also be used for
IOVA when nesting is disabled there is no plan to do so on Intel
platforms).

> a scalable-mode PASID Entry. Relevant Intel IOMMU SDM ref for first stage
> table "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second
> stage table "3.7.2 Accessed and Dirty Flags".
> 
> To enable it scalable-mode for the second-stage table is required,
> solimit the use of dirty-bit to scalable-mode and discarding the
> first stage configured DMAR domains. To use SSADS, we set a bit in

above is inaccurate. dirty bit is only supported in scalable mode so
there is no limit per se.

> the scalable-mode PASID Table entry, by setting bit 9 (SSADE). When

"To use SSADS, we set bit 9 (SSADE) in the scalable-mode PASID table
entry"

> doing so, flush all iommu caches. Relevant SDM refs:
> 
> "3.7.2 Accessed and Dirty Flags"
> "6.5.3.3 Guidance to Software for Invalidations,
>  Table 23. Guidance to Software for Invalidations"
> 
> Dirty bit on the PTE is located in the same location (bit 9). The IOTLB

I'm not sure what information 'same location' here tries to convey...

> caches some attributes when SSADE is enabled and dirty-ness information,

be direct that the dirty bit is cached in IOTLB thus any change of that
bit requires flushing IOTLB

> so we also need to flush IOTLB to make sure IOMMU attempts to set the
> dirty bit again. Relevant manuals over the hardware translation is
> chapter 6 with some special mention to:
> 
> "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations"
> "6.2.4 IOTLB"
> 
> Signed-off-by: Joao Martins <[email protected]>
> ---
> Shouldn't probably be as aggresive as to flush all; needs
> checking with hardware (and invalidations guidance) as to understand
> what exactly needs flush.

yes, definitely not required to flush all. You can follow table 23
for software guidance for invalidations.

> ---
>  drivers/iommu/intel/iommu.c | 109
> ++++++++++++++++++++++++++++++++++++
>  drivers/iommu/intel/pasid.c |  76 +++++++++++++++++++++++++
>  drivers/iommu/intel/pasid.h |   7 +++
>  include/linux/intel-iommu.h |  14 +++++
>  4 files changed, 206 insertions(+)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index ce33f85c72ab..92af43f27241 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -5089,6 +5089,113 @@ static void intel_iommu_iotlb_sync_map(struct
> iommu_domain *domain,
>       }
>  }
> 
> +static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain,
> +                                       bool enable)
> +{
> +     struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +     struct device_domain_info *info;
> +     unsigned long flags;
> +     int ret = -EINVAL;
> +
> +     spin_lock_irqsave(&device_domain_lock, flags);
> +     if (list_empty(&dmar_domain->devices)) {
> +             spin_unlock_irqrestore(&device_domain_lock, flags);
> +             return ret;
> +     }

or return success here and just don't set any dirty bitmap in
read_and_clear_dirty()?

btw I think every iommu driver needs to record the tracking status
so later if a device which doesn't claim dirty tracking support is
attached to a domain which already has dirty_tracking enabled
then the attach request should be rejected. once the capability
uAPI is introduced.

> +
> +     list_for_each_entry(info, &dmar_domain->devices, link) {
> +             if (!info->dev || (info->domain != dmar_domain))
> +                     continue;

why would there be a device linked under a dmar_domain but its
internal domain pointer doesn't point to that dmar_domain?

> +
> +             /* Dirty tracking is second-stage level SM only */
> +             if ((info->domain && domain_use_first_level(info->domain))
> ||
> +                 !ecap_slads(info->iommu->ecap) ||
> +                 !sm_supported(info->iommu) || !intel_iommu_sm) {

sm_supported() already covers the check on intel_iommu_sm.

> +                     ret = -EOPNOTSUPP;
> +                     continue;
> +             }
> +
> +             ret = intel_pasid_setup_dirty_tracking(info->iommu, info-
> >domain,
> +                                                  info->dev,
> PASID_RID2PASID,
> +                                                  enable);
> +             if (ret)
> +                     break;
> +     }
> +     spin_unlock_irqrestore(&device_domain_lock, flags);
> +
> +     /*
> +      * We need to flush context TLB and IOTLB with any cached
> translations
> +      * to force the incoming DMA requests for have its IOTLB entries
> tagged
> +      * with A/D bits
> +      */
> +     intel_flush_iotlb_all(domain);
> +     return ret;
> +}
> +
> +static int intel_iommu_get_dirty_tracking(struct iommu_domain *domain)
> +{
> +     struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +     struct device_domain_info *info;
> +     unsigned long flags;
> +     int ret = 0;
> +
> +     spin_lock_irqsave(&device_domain_lock, flags);
> +     list_for_each_entry(info, &dmar_domain->devices, link) {
> +             if (!info->dev || (info->domain != dmar_domain))
> +                     continue;
> +
> +             /* Dirty tracking is second-stage level SM only */
> +             if ((info->domain && domain_use_first_level(info->domain))
> ||
> +                 !ecap_slads(info->iommu->ecap) ||
> +                 !sm_supported(info->iommu) || !intel_iommu_sm) {
> +                     ret = -EOPNOTSUPP;
> +                     continue;
> +             }
> +
> +             if (!intel_pasid_dirty_tracking_enabled(info->iommu, info-
> >domain,
> +                                              info->dev, PASID_RID2PASID))
> {
> +                     ret = -EINVAL;
> +                     break;
> +             }
> +     }
> +     spin_unlock_irqrestore(&device_domain_lock, flags);
> +
> +     return ret;
> +}

All above can be translated to a single status bit in dmar_domain.

> +
> +static int intel_iommu_read_and_clear_dirty(struct iommu_domain
> *domain,
> +                                         unsigned long iova, size_t size,
> +                                         struct iommu_dirty_bitmap *dirty)
> +{
> +     struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> +     unsigned long end = iova + size - 1;
> +     unsigned long pgsize;
> +     int ret;
> +
> +     ret = intel_iommu_get_dirty_tracking(domain);
> +     if (ret)
> +             return ret;
> +
> +     do {
> +             struct dma_pte *pte;
> +             int lvl = 0;
> +
> +             pte = pfn_to_dma_pte(dmar_domain, iova >>
> VTD_PAGE_SHIFT, &lvl);

it's probably fine as the starting point but moving forward this could
be further optimized so there is no need to walk from L4->L3->L2->L1
for every pte.

> +             pgsize = level_size(lvl) << VTD_PAGE_SHIFT;
> +             if (!pte || !dma_pte_present(pte)) {
> +                     iova += pgsize;
> +                     continue;
> +             }
> +
> +             /* It is writable, set the bitmap */
> +             if (dma_sl_pte_test_and_clear_dirty(pte))
> +                     iommu_dirty_bitmap_record(dirty, iova, pgsize);
> +             iova += pgsize;
> +     } while (iova < end);
> +
> +     return 0;
> +}
> +
>  const struct iommu_ops intel_iommu_ops = {
>       .capable                = intel_iommu_capable,
>       .domain_alloc           = intel_iommu_domain_alloc,
> @@ -5119,6 +5226,8 @@ const struct iommu_ops intel_iommu_ops = {
>               .iotlb_sync             = intel_iommu_tlb_sync,
>               .iova_to_phys           = intel_iommu_iova_to_phys,
>               .free                   = intel_iommu_domain_free,
> +             .set_dirty_tracking     = intel_iommu_set_dirty_tracking,
> +             .read_and_clear_dirty   = intel_iommu_read_and_clear_dirty,
>       }
>  };
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 10fb82ea467d..90c7e018bc5c 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -331,6 +331,11 @@ static inline void pasid_set_bits(u64 *ptr, u64 mask,
> u64 bits)
>       WRITE_ONCE(*ptr, (old & ~mask) | bits);
>  }
> 
> +static inline u64 pasid_get_bits(u64 *ptr)
> +{
> +     return READ_ONCE(*ptr);
> +}
> +
>  /*
>   * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode
>   * PASID entry.
> @@ -389,6 +394,36 @@ static inline void pasid_set_fault_enable(struct
> pasid_entry *pe)
>       pasid_set_bits(&pe->val[0], 1 << 1, 0);
>  }
> 
> +/*
> + * Enable second level A/D bits by setting the SLADE (Second Level
> + * Access Dirty Enable) field (Bit 9) of a scalable mode PASID
> + * entry.
> + */
> +static inline void pasid_set_ssade(struct pasid_entry *pe)
> +{
> +     pasid_set_bits(&pe->val[0], 1 << 9, 1 << 9);
> +}
> +
> +/*
> + * Enable second level A/D bits by setting the SLADE (Second Level
> + * Access Dirty Enable) field (Bit 9) of a scalable mode PASID
> + * entry.
> + */
> +static inline void pasid_clear_ssade(struct pasid_entry *pe)
> +{
> +     pasid_set_bits(&pe->val[0], 1 << 9, 0);
> +}
> +
> +/*
> + * Checks if second level A/D bits by setting the SLADE (Second Level
> + * Access Dirty Enable) field (Bit 9) of a scalable mode PASID
> + * entry is enabled.
> + */
> +static inline bool pasid_get_ssade(struct pasid_entry *pe)
> +{
> +     return pasid_get_bits(&pe->val[0]) & (1 << 9);
> +}
> +
>  /*
>   * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
>   * scalable mode PASID entry.
> @@ -725,6 +760,47 @@ int intel_pasid_setup_second_level(struct
> intel_iommu *iommu,
>       return 0;
>  }
> 
> +/*
> + * Set up dirty tracking on a second only translation type.
> + */
> +int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
> +                                  struct dmar_domain *domain,
> +                                  struct device *dev, u32 pasid,
> +                                  bool enabled)
> +{
> +     struct pasid_entry *pte;
> +
> +     pte = intel_pasid_get_entry(dev, pasid);
> +     if (!pte) {
> +             dev_err(dev, "Failed to get pasid entry of PASID %d\n",
> pasid);
> +             return -ENODEV;
> +     }
> +
> +     if (enabled)
> +             pasid_set_ssade(pte);
> +     else
> +             pasid_clear_ssade(pte);
> +     return 0;
> +}
> +
> +/*
> + * Set up dirty tracking on a second only translation type.
> + */
> +bool intel_pasid_dirty_tracking_enabled(struct intel_iommu *iommu,
> +                                     struct dmar_domain *domain,
> +                                     struct device *dev, u32 pasid)
> +{
> +     struct pasid_entry *pte;
> +
> +     pte = intel_pasid_get_entry(dev, pasid);
> +     if (!pte) {
> +             dev_err(dev, "Failed to get pasid entry of PASID %d\n",
> pasid);
> +             return false;
> +     }
> +
> +     return pasid_get_ssade(pte);
> +}
> +
>  /*
>   * Set up the scalable mode pasid entry for passthrough translation type.
>   */
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index ab4408c824a5..3dab86017228 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -115,6 +115,13 @@ int intel_pasid_setup_first_level(struct intel_iommu
> *iommu,
>  int intel_pasid_setup_second_level(struct intel_iommu *iommu,
>                                  struct dmar_domain *domain,
>                                  struct device *dev, u32 pasid);
> +int intel_pasid_setup_dirty_tracking(struct intel_iommu *iommu,
> +                                  struct dmar_domain *domain,
> +                                  struct device *dev, u32 pasid,
> +                                  bool enabled);
> +bool intel_pasid_dirty_tracking_enabled(struct intel_iommu *iommu,
> +                                     struct dmar_domain *domain,
> +                                     struct device *dev, u32 pasid);
>  int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>                                  struct dmar_domain *domain,
>                                  struct device *dev, u32 pasid);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 5cfda90b2cca..1328d1805197 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -47,6 +47,9 @@
>  #define DMA_FL_PTE_DIRTY     BIT_ULL(6)
>  #define DMA_FL_PTE_XD                BIT_ULL(63)
> 
> +#define DMA_SL_PTE_DIRTY_BIT 9
> +#define DMA_SL_PTE_DIRTY     BIT_ULL(DMA_SL_PTE_DIRTY_BIT)
> +
>  #define ADDR_WIDTH_5LEVEL    (57)
>  #define ADDR_WIDTH_4LEVEL    (48)
> 
> @@ -677,6 +680,17 @@ static inline bool dma_pte_present(struct dma_pte
> *pte)
>       return (pte->val & 3) != 0;
>  }
> 
> +static inline bool dma_sl_pte_dirty(struct dma_pte *pte)
> +{
> +     return (pte->val & DMA_SL_PTE_DIRTY) != 0;
> +}
> +
> +static inline bool dma_sl_pte_test_and_clear_dirty(struct dma_pte *pte)
> +{
> +     return test_and_clear_bit(DMA_SL_PTE_DIRTY_BIT,
> +                               (unsigned long *)&pte->val);
> +}
> +
>  static inline bool dma_pte_superpage(struct dma_pte *pte)
>  {
>       return (pte->val & DMA_PTE_LARGE_PAGE);
> --
> 2.17.2

_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to