Performance on NVMe has increased a lot.

-update kernel
Run status group 0 (all jobs):
   READ: bw=1358MiB/s (1424MB/s), 1358MiB/s-1358MiB/s (1424MB/s-1424MB/s), 
io=133GiB (142GB), run=100009-100009msec

-proposed kernel
Run status group 0 (all jobs):
   READ: bw=1406MiB/s (1475MB/s), 1406MiB/s-1406MiB/s (1475MB/s-1475MB/s), 
io=137GiB (147GB), run=100005-100005msec

-proposed kernel with iommu.strict=0
Run status group 0 (all jobs):
   READ: bw=1732MiB/s (1816MB/s), 1732MiB/s-1732MiB/s (1816MB/s-1816MB/s), 
io=169GiB (182GB), run=100004-100004msec

** Tags removed: verification-needed-bionic verification-needed-cosmic
** Tags added: verification-done-bionic verification-done-cosmic

You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.

  Support non-strict iommu mode on arm64

Status in linux package in Ubuntu:
  Fix Committed
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Committed

Bug description:
  The Intel IOMMU driver provides an option for strict mode. When disabled, 
batching of IOTLB flush operations is permitted, allowing the user to trade-off 
isolation for improved performance. Ubuntu's kernel currently lacks a parity 
for this feature for ARM.

  There's a significant performance gain to be had by removing the need
  to flush the IOMMU TLB on every unmap on arm64. I'm seeing a 25%
  performance gain w/ fio reads on a single NVMe device.

  This mode of operation is available for x86 via the
  "intel_iommu=strict" parameter. Upstream now exposes an equivalent
  feature for ARM platforms via the "iommu.strict=[0|1]" parameter,
  while retaining the default strict-enabled mode.

  [Test Case]
  Run fio with the following config before and after applying the patches and 
collection IOPS count. Run again after applying the patches. Finally, run a 3rd 
time after adding iommu.strict=0 to the kernel commandline.

  Performance should not regress after the update. Performance should
  further improve after adding iommu.strict=0 - but if it doesn't for
  some reason, that is not a regression.

  $ cat fio.rc
  loops = 10000


  44f6876a00e83 iommu/arm-smmu: Support non-strict mode
  b2dfeba654cb0 iommu/io-pgtable-arm-v7s: Add support for non-strict mode
  9662b99a19abc iommu/arm-smmu-v3: Add support for non-strict mode
  b6b65ca20bc93 iommu/io-pgtable-arm: Add support for non-strict mode
  68a6efe86f6a1 iommu: Add "iommu.strict" command line option
  2da274cdf998a iommu/dma: Add support for non-strict mode
  07fdef34d2be6 iommu/arm-smmu-v3: Implement flush_iotlb_all hook
  85c7a0f1ef624 iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()

  [Regression Risk]
  Most of these patches are specific to ARM, and have been regression tested on 
both arm64 (HiSilicon D06) and armhf (QEMU virt) using "stress-ng --vm $(nproc)"

  2 patches do touch arch-indep code however:

  > 68a6efe86f6a1 iommu: Add "iommu.strict" command line option
  Adds a new command line option and sets an attribute that iommu drivers can 
optionally react to. Doesn't change default behavior.

  > 2da274cdf998a iommu/dma: Add support for non-strict mode
  This driver is only built for arm64 and ppc64el (determined by looking at the 
build logs). Most of this patch only changes behavior in the non-default (and 
new) iommu.strict=0 case. The exception, which is called out in the commit 
message, is this hunk:

  -       WARN_ON(iommu_unmap(domain, dma_addr, size) != size);
  +       WARN_ON(iommu_unmap_fast(domain, dma_addr, size) != size);
  +       if (!cookie->fq_domain)
  +               iommu_tlb_sync(domain);

  In the default case, where fq_domain will be NULl, we are now factoring 
iommu_unmap() into:

  Looking at the source to iommu_unmap() confirms that this is
  functionally equivalent.

To manage notifications about this bug go to:

Mailing list:
Post to     :
Unsubscribe :
More help   :

Reply via email to