On Tue, Apr 19, 2016 at 1:47 PM, Shaohua Li <[email protected]> wrote: > On Tue, Apr 19, 2016 at 07:48:16PM +0300, Adam Morrison wrote: > > This patchset improves the scalability of the Intel IOMMU code by > > resolving two spinlock bottlenecks, yielding up to ~5x performance > > improvement and approaching iommu=off performance. > > > > For example, here's the throughput obtained by 16 memcached instances > > running on a 16-core Sandy Bridge system, accessed using memslap on > > another machine that has iommu=off, using the default memslap config > > (64-byte keys, 1024-byte values, and 10%/90% SET/GET ops): > > > > stock iommu=off: > > 990,803 memcached transactions/sec (=100%, median of 10 runs). > > stock iommu=on: > > 221,416 memcached transactions/sec (=22%). > > [61.70% 0.63% memcached [kernel.kallsyms] [k] > _raw_spin_lock_irqsave] > > patched iommu=on: > > 963,159 memcached transactions/sec (=97%). > > [1.29% 1.10% memcached [kernel.kallsyms] [k] > _raw_spin_lock_irqsave] > > > > The two resolved spinlocks: > > > > - Deferred IOTLB invalidations are batched in a global data structure > > and serialized under a spinlock (add_unmap() & flush_unmaps()); this > > patchset batches IOTLB invalidations in a per-CPU data structure. > > > > - IOVA management (alloc_iova() & __free_iova()) is serialized under > > the rbtree spinlock; this patchset adds per-CPU caches of allocated > > IOVAs so that the rbtree doesn't get accessed frequently. (Adding a > > cache above the existing IOVA allocator is less intrusive than dynamic > > identity mapping and helps keep IOMMU page table usage low; see > > Patch 7.) > > > > The paper "Utilizing the IOMMU Scalably" (presented at the 2015 USENIX > > Annual Technical Conference) contains many more details and experiments: > > > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__www.usenix.org_system_files_conference_atc15_atc15-2Dpaper-2Dpeleg.pdf&d=CwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=X13hAPkxmvBro1Ug8vcKHw&m=O-p7wCR-G4eXQJOhyiio_pLUyJGkaFCUv4CNBrTdMPs&s=T7ynyUVZWcBkslPyKJqEUUggCmFDrsglpKRu0I3EXhQ&e= > > > > v3: > > * Patch 7/7: Respect the caller-passed limit IOVA when satisfying an > IOVA > > allocation from the cache. > > Thanks, looks good. I'm still thinking to have 2 caches, one for DMA32 > and the other for DMA64. Mixing them in one cache might make allocation > from cache have more failure. But we can do this later if it is a real > problem. So for the the whole series > Reviewed-by: Shaohua Li <[email protected]> >
I agree, looks great. I gave one slight comment on a comment in 7/7, but after that: Reviewed-by: Ben Serebrin <[email protected]> Thanks for doing all of this!
_______________________________________________ iommu mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/iommu
