On Thu, Apr 14, 2016 at 2:33 PM, Shaohua Li <[email protected]> wrote: > On Thu, Apr 14, 2016 at 02:18:32PM -0700, Benjamin Serebrin wrote: > > On Thu, Apr 14, 2016 at 2:05 PM, Adam Morrison <[email protected]> > wrote: > > > On Thu, Apr 14, 2016 at 9:26 PM, Benjamin Serebrin via iommu > > > <[email protected]> wrote: > > > > > >> It was pointed out that DMA_32 or _24 (or anything other non-64 size) > > >> could be starved if the magazines on all cores are full and the depot > > >> is empty. (This gets more probable with increased core count.) You > > >> could try one more time: call free_iova_rcaches() and try alloc_iova > > >> again before giving up > > > > > > That's not safe, unfortunately. free_iova_rcaches() is meant to be > > > called only when the domain is dying and the CPUs won't access the > > > rcaches. > > > > Fair enough. Is it possible to make this safe, cleanly and without > > too much locking during the normal case? > > > > > It's tempting to make the rcaches work only for DMA_64 allocations. > > > This would also solve the problem of respecting the pfn_limit when > > > allocating, which Shaohua Li pointed out. Sadly, intel-iommu.c > > > converts DMA_64 to DMA_32 by default, apparently to avoid dual address > > > cycles on the PCI bus. I wonder about the importance of this, though, > > > as it doesn't seem that anything equivalent happens when iommu=off. > > > > I agree. It's tempting to make all DMA_64 allocations grow up from > > 4G, leaving the entire 32 bit space free for small allocations. I'd > > be willing to argue that that should be the default, with some > > override for anyone who finds it objectionable. > > > > Dual address cycle is really "4 more bytes in the TLP header" on PCIe; > > a 32-bit address takes 3 doublewords (12 bytes) while a 64-bit address > > takes 4 DW (16 bytes). What's 25% of a read request between friends? > > And every read request has a read response 3DW TLP plus its data, so > > the aggregate bandwidth consumed is getting uninteresting. Similarly > > for writes, the additional address bytes don't cost a large > > percentage. > > > > That being said, it's a rare device that needs more than 4GB of active > > address space, and it's a rare system that needs to mix a > > performance-critical DMA_32 (or 24) and _64 device in the same page > > table. > > I'm not sure about the TLP overhead. IOMMU is not only for pcie device. > there are pcie-to-pcix/pci bridge, any pci device can reside behind it. > The device might not handle DMA_64. DAC has overhead for pcix device > iirc, which somebody might care about. So let's not break such devices. > > Thanks, > Shaohua >
Thanks, Shaohua. As Adam mentioned, in iommu=off cases, there's no enforcement that keeps any PCIe address below 4GB. If you have a system with DRAM addresses above 4GB and you're using any of the iommu disabled or 1:1 settings, you'll encounter this. So the change in allocation policy would not add any new failure mode; we're already going to encounter it today.
_______________________________________________ iommu mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/iommu
