Another piece of information :

The observations are same, if the current pci-device (sd/mmc
controller) is detached, and another pci-device (sound controller) is
attached to the guest.

So, it looks that we can rule out any (pci-)device-specific issue.


For brevity, here are the details of the other pci-device I tried with :

###############################################
sudo lspci -vvv

00:1b.0 Audio device: Intel Corporation 6 Series/C200 Series Chipset
Family High Definition Audio Controller (rev 04)
    DeviceName:  Onboard Audio
    Subsystem: Dell 6 Series/C200 Series Chipset Family High
Definition Audio Controller
    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin A routed to IRQ 31
    IOMMU group: 5
    Region 0: Memory at e2e60000 (64-bit, non-prefetchable) [size=16K]
    Capabilities: [50] Power Management version 2
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Address: 00000000fee00358  Data: 0000
    Capabilities: [70] Express (v1) Root Complex Integrated Endpoint, MSI 00
        DevCap:    MaxPayload 128 bytes, PhantFunc 0
            ExtTag- RBE- FLReset+
        DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
            MaxPayload 128 bytes, MaxReadReq 128 bytes
        DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
    Capabilities: [100 v1] Virtual Channel
        Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
        Arb:    Fixed- WRR32- WRR64- WRR128-
        Ctrl:    ArbSelect=Fixed
        Status:    InProgress-
        VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=01
            Status:    NegoPending- InProgress-
        VC1:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
            Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
            Ctrl:    Enable+ ID=1 ArbSelect=Fixed TC/VC=22
            Status:    NegoPending- InProgress-
    Capabilities: [130 v1] Root Complex Link
        Desc:    PortNumber=0f ComponentID=00 EltType=Config
        Link0:    Desc:    TargetPort=00 TargetComponent=00 AssocRCRB-
LinkType=MemMapped LinkValid+
            Addr:    00000000fed1c000
    Kernel driver in use: snd_hda_intel
    Kernel modules: snd_hda_intel
###############################################

On Fri, Oct 22, 2021 at 11:03 PM Ajay Garg <ajaygargn...@gmail.com> wrote:
>
> Ping ..
>
> Any updates please on this?
>
> It will be great to have the fix upstreamed (properly of course).
>
> Right now, the patch contains the change as suggested, of
> explicitly/properly clearing out dma-mappings when unmap is called.
> Please let me know in whatever way I can help, including
> testing/debugging for other approaches if required.
>
>
> Many thanks to Alex and Lu for their continued support on the issue.
>
>
>
> P.S. :
>
> I might have missed mentioning the information about the device that
> causes flooding.
> Please find it below :
>
> ######################################
> sudo lspci -vvv
>
> 0a:00.0 SD Host controller: O2 Micro, Inc. OZ600FJ0/OZ900FJ0/OZ600FJS
> SD/MMC Card Reader Controller (rev 05) (prog-if 01)
>     Subsystem: Dell OZ600FJ0/OZ900FJ0/OZ600FJS SD/MMC Card Reader Controller
>     Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
> ParErr- Stepping- SERR- FastB2B- DisINTx-
>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>     Latency: 0, Cache Line Size: 64 bytes
>     Interrupt: pin A routed to IRQ 17
>     IOMMU group: 14
>     Region 0: Memory at e2c20000 (32-bit, non-prefetchable) [size=512]
>     Capabilities: [a0] Power Management version 3
>         Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
> PME(D0+,D1+,D2+,D3hot+,D3cold+)
>         Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>     Capabilities: [48] MSI: Enable- Count=1/1 Maskable+ 64bit+
>         Address: 0000000000000000  Data: 0000
>         Masking: 00000000  Pending: 00000000
>     Capabilities: [80] Express (v1) Endpoint, MSI 00
>         DevCap:    MaxPayload 128 bytes, PhantFunc 0, Latency L0s <4us, L1 
> <64us
>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
> SlotPowerLimit 10.000W
>         DevCtl:    CorrErr- NonFatalErr- FatalErr- UnsupReq-
>             RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
>             MaxPayload 128 bytes, MaxReadReq 512 bytes
>         DevSta:    CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- 
> TransPend-
>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit
> Latency L0s <512ns, L1 <64us
>             ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
>         LnkCtl:    ASPM L0s Enabled; RCB 64 bytes, Disabled- CommClk-
>             ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>         LnkSta:    Speed 2.5GT/s (ok), Width x1 (ok)
>             TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
>     Capabilities: [100 v1] Virtual Channel
>         Caps:    LPEVC=0 RefClk=100ns PATEntryBits=1
>         Arb:    Fixed- WRR32- WRR64- WRR128-
>         Ctrl:    ArbSelect=Fixed
>         Status:    InProgress-
>         VC0:    Caps:    PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
>             Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
>             Ctrl:    Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
>             Status:    NegoPending- InProgress-
>     Capabilities: [200 v1] Advanced Error Reporting
>         UESta:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>         UEMsk:    DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>         UESvrt:    DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt-
> RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>         CESta:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
>         CEMsk:    RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
>         AERCap:    First Error Pointer: 00, ECRCGenCap- ECRCGenEn-
> ECRCChkCap- ECRCChkEn-
>             MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
>         HeaderLog: 00000000 00000000 00000000 00000000
>     Kernel driver in use: sdhci-pci
>     Kernel modules: sdhci_pci
> ######################################
>
>
>
> Thanks and Regards,
> Ajay
>
> On Tue, Oct 12, 2021 at 7:27 PM Ajay Garg <ajaygargn...@gmail.com> wrote:
> >
> > Origins at :
> > https://lists.linuxfoundation.org/pipermail/iommu/2021-October/thread.html
> >
> > === Changes from v1 => v2 ===
> >
> > a)
> > Improved patch-description.
> >
> > b)
> > A more root-level fix, as suggested by
> >
> >         1.
> >         Alex Williamson <alex.william...@redhat.com>
> >
> >         2.
> >         Lu Baolu <baolu...@linux.intel.com>
> >
> >
> >
> > === Issue ===
> >
> > Kernel-flooding is seen, when an x86_64 L1 guest (Ubuntu-21) is booted in 
> > qemu/kvm
> > on a x86_64 host (Ubuntu-21), with a host-pci-device attached.
> >
> > Following kind of logs, along with the stacktraces, cause the flood :
> >
> > ......
> >  DMAR: ERROR: DMA PTE for vPFN 0x428ec already set (to 3f6ec003 not 
> > 3f6ec003)
> >  DMAR: ERROR: DMA PTE for vPFN 0x428ed already set (to 3f6ed003 not 
> > 3f6ed003)
> >  DMAR: ERROR: DMA PTE for vPFN 0x428ee already set (to 3f6ee003 not 
> > 3f6ee003)
> >  DMAR: ERROR: DMA PTE for vPFN 0x428ef already set (to 3f6ef003 not 
> > 3f6ef003)
> >  DMAR: ERROR: DMA PTE for vPFN 0x428f0 already set (to 3f6f0003 not 
> > 3f6f0003)
> > ......
> >
> >
> >
> > === Current Behaviour, leading to the issue ===
> >
> > Currently, when we do a dma-unmapping, we unmap/unlink the mappings, but
> > the pte-entries are not cleared.
> >
> > Thus, following sequencing would flood the kernel-logs :
> >
> > i)
> > A dma-unmapping makes the real/leaf-level pte-slot invalid, but the
> > pte-content itself is not cleared.
> >
> > ii)
> > Now, during some later dma-mapping procedure, as the pte-slot is about
> > to hold a new pte-value, the intel-iommu checks if a prior
> > pte-entry exists in the pte-slot. If it exists, it logs a kernel-error,
> > along with a corresponding stacktrace.
> >
> > iii)
> > Step ii) runs in abundance, and the kernel-logs run insane.
> >
> >
> >
> > === Fix ===
> >
> > We ensure that as part of a dma-unmapping, each (unmapped) pte-slot
> > is also cleared of its value/content (at the leaf-level, where the
> > real mapping from a iova => pfn mapping is stored).
> >
> > This completes a "deep" dma-unmapping.
> >
> >
> >
> > Signed-off-by: Ajay Garg <ajaygargn...@gmail.com>
> > ---
> >  drivers/iommu/intel/iommu.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index d75f59ae28e6..485a8ea71394 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -5090,6 +5090,8 @@ static size_t intel_iommu_unmap(struct iommu_domain 
> > *domain,
> >         gather->freelist = domain_unmap(dmar_domain, start_pfn,
> >                                         last_pfn, gather->freelist);
> >
> > +       dma_pte_clear_range(dmar_domain, start_pfn, last_pfn);
> > +
> >         if (dmar_domain->max_addr == iova + size)
> >                 dmar_domain->max_addr = iova;
> >
> > --
> > 2.30.2
> >
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to