On Thu, Apr 14, 2016 at 2:33 PM, Shaohua Li <[email protected]> wrote:

> On Thu, Apr 14, 2016 at 02:18:32PM -0700, Benjamin Serebrin wrote:
> > On Thu, Apr 14, 2016 at 2:05 PM, Adam Morrison <[email protected]>
> wrote:
> > > On Thu, Apr 14, 2016 at 9:26 PM, Benjamin Serebrin via iommu
> > > <[email protected]> wrote:
> > >
> > >> It was pointed out that DMA_32 or _24 (or anything other non-64 size)
> > >> could be starved if the magazines on all cores are full and the depot
> > >> is empty.  (This gets more probable with increased core count.)  You
> > >> could try one more time: call free_iova_rcaches() and try alloc_iova
> > >> again before giving up
> > >
> > > That's not safe, unfortunately.  free_iova_rcaches() is meant to be
> > > called only when the domain is dying and the CPUs won't access the
> > > rcaches.
> >
> > Fair enough.  Is it possible to make this safe, cleanly and without
> > too much locking during the normal case?
> >
> > > It's tempting to make the rcaches work only for DMA_64 allocations.
> > > This would also solve the problem of respecting the pfn_limit when
> > > allocating, which Shaohua Li pointed out.  Sadly, intel-iommu.c
> > > converts DMA_64 to DMA_32 by default, apparently to avoid dual address
> > > cycles on the PCI bus.  I wonder about the importance of this, though,
> > > as it doesn't seem that anything equivalent happens when iommu=off.
> >
> > I agree.  It's tempting to make all DMA_64 allocations grow up from
> > 4G, leaving the entire 32 bit space free for small allocations.  I'd
> > be willing to argue that that should be the default, with some
> > override for anyone who finds it objectionable.
> >
> > Dual address cycle is really "4 more bytes in the TLP header" on PCIe;
> > a 32-bit address takes 3 doublewords (12 bytes) while a 64-bit address
> > takes 4 DW (16 bytes).  What's 25% of a read request between friends?
> > And every read request has a read response 3DW TLP plus its data, so
> > the aggregate bandwidth consumed is getting uninteresting.  Similarly
> > for writes, the additional address bytes don't cost a large
> > percentage.
> >
> > That being said, it's a rare device that needs more than 4GB of active
> > address space, and it's a rare system that needs to mix a
> > performance-critical DMA_32 (or 24) and _64 device in the same page
> > table.
>
> I'm not sure about the TLP overhead. IOMMU is not only for pcie device.
> there are pcie-to-pcix/pci bridge, any pci device can reside behind it.
> The device might not handle DMA_64. DAC has overhead for pcix device
> iirc, which somebody might care about. So let's not break such devices.
>
> Thanks,
> Shaohua
>

Thanks, Shaohua.

As Adam mentioned, in iommu=off cases, there's no enforcement that keeps
any PCIe address below 4GB.  If you have a system with DRAM addresses above
4GB and you're using any of the iommu disabled or 1:1 settings, you'll
encounter this.  So the change in allocation policy would not add any new
failure mode; we're already going to encounter it today.
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to