On Tue, Aug 20, 2019 at 05:50:06PM +0100, Robin Murphy wrote:
> On 20/08/2019 16:45, Will Deacon wrote:
> > When invalidating the ATC for an PCIe endpoint using ATS, we must take
> > care to complete invalidation of the main SMMU TLBs beforehand, otherwise
> > the device could immediately repopulate its ATC with stale translations.
> > 
> > Hooking the ATC invalidation into ->unmap() as we currently do does the
> > exact opposite: it ensures that the ATC is invalidated *before*  the
> > main TLBs, which is bogus.
> > 
> > Move ATC invalidation into the actual (leaf) invalidation routines so
> > that it is always called after completing main TLB invalidation.
> > 
> > Signed-off-by: Will Deacon <[email protected]>
> > ---
> >   drivers/iommu/arm-smmu-v3.c | 12 +++++-------
> >   1 file changed, 5 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > index 9096eca0c480..183a1c121179 100644
> > --- a/drivers/iommu/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm-smmu-v3.c
> > @@ -1961,6 +1961,7 @@ static void arm_smmu_tlb_inv_context(void *cookie)
> >      */
> >     arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> >     arm_smmu_cmdq_issue_sync(smmu);
> > +   arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
> >   }
> >   static void arm_smmu_tlb_inv_range(unsigned long iova, size_t size,
> > @@ -1969,7 +1970,7 @@ static void arm_smmu_tlb_inv_range(unsigned long 
> > iova, size_t size,
> >   {
> >     u64 cmds[CMDQ_BATCH_ENTRIES * CMDQ_ENT_DWORDS];
> >     struct arm_smmu_device *smmu = smmu_domain->smmu;
> > -   unsigned long end = iova + size;
> > +   unsigned long start = iova, end = iova + size;
> >     int i = 0;
> >     struct arm_smmu_cmdq_ent cmd = {
> >             .tlbi = {
> > @@ -1998,6 +1999,8 @@ static void arm_smmu_tlb_inv_range(unsigned long 
> > iova, size_t size,
> >     }
> >     arm_smmu_cmdq_issue_cmdlist(smmu, cmds, i, true);
> > +   if (leaf)
> > +           arm_smmu_atc_inv_domain(smmu_domain, 0, start, size);
> 
> I still need to get up to speed on your cmdlist and unmap changes, but in
> isolation this "if (leaf)" guard looks a bit dodgy - in the case where
> io-pgtable goes to unmap a 2MB block, finds it's mapped as a table, and
> blows it away in one go, we'll only see a non-leaf TLBI call for that range,
> no?

Yuck, this is quite horrible. I don't think the ATC is permitted to cache
intermediate walks, so we actually don't need the thing to be synchronous
here. But if we update the gather structure as well, then we risk
over-invalidating for the non-ATS case when we get to the sync.

I'll have a think.

> Tangentially, does arm_smmu_atc_inv_domain() really need to sync once for
> each individual master, or could that do better as well? Not something we
> should worry about right now, but now that I'm looking I may as well note it
> for the record.

Indeed -- that function should be rewritten using the cmdlist() stuff I've
done. I'm just reluctant to start optimising for the ATS case when I'm not
able to test it.

Thanks,

Will
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to