On 2021-06-08 08:50, chenxiang (M) wrote:
Hi Robin,
在 2021/6/7 19:23, Robin Murphy 写道:
On 2021-06-07 03:42, chenxiang wrote:
From: Xiang Chen <[email protected]>
When rmmod the driver of the last device in the group, cached iovas
are not
used, and it is better to free them to save memories. And also export
function free_rcache_cached_iovas() and iommu_domain_to_iova().
How common is it to use a device a significant amount, then unbind its
driver and not use it for long enough to care about? Off-hand I can't
think of a particularly realistic use-case which looks like that - the
closest I can imagine is unbinding a default kernel driver to replace
it with VFIO, but I would expect the set of devices intended for
assignment to be distinct from the set of devices used by the host
itself, and thus the host driver wouldn't have actually done much to
generate a lot of DMA mappings in that initial period. Is my
expectation there wrong?
It is possible that user uses the driver for a while and then rmmod it
(for example, SAS card is the driver we use always, but sometimes we
need to compare the performance with raid card, so we will insmod the
raid driver, and rmmod
the driver after the test). At this situation, the rcache iovas of raid
card are always still even if rmmod the driver (user always rmmod the
driver rather than removing the device which will release the group and
release all resources).
OK, so my question is how many end users are doing that on production
systems with distro kernels, such that this feature deserves being
supported and maintained in mainline?
If there is such a case, how much memory does this actually save in
practice? The theoretical worst-case should be roughly 4 * 128 * 6 *
sizeof(struct iova) bytes per CPU, which is around 192KB assuming
64-byte kmem cache alignment.
There are 128 CPUs in our ARM64 kunpeng920 board, and i add a debugfs
interface drop_rcache of every group in local code which calls function
free_rcache_cached_iovas(), and we can see that it saves 1~2M memory
after freeing cached iovas.
estuary:/$ free
total used free shared buffers cached
Mem: 526776720 1336216 525440504 177664 0 177664
-/+ buffers/cache: 1158552 525618168
Swap: 0 0 0
estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ echo 1 > drop_rcache
estuary:/sys/kernel/debug/iommu/iovad/iommu_domain29$ free
total used free shared buffers cached
Mem: 526776720 1334672 525442048 177664 0 177664
-/+ buffers/cache: 1157008 525619712
Swap: 0 0 0
Erm, isn't 12KB from 4GB (per CPU) basically just noise? :/
Sure, it might start to look a bit more significant on a quad-core TV
box with 1GB of RAM total, but then something like that isn't ever going
to be driving a big multi-queue SAS adapter to churn that many DMA
mappings in the first place.
Consider that a "memory saving" feature which only has some minimal
benefit for niche use-cases on very large systems, but costs a constant
amount of memory in terms of code and symbol bloat in the kernel image,
even on actually-memory-constrained systems, is rather counterproductive
overall. As I said before, I do like the general idea *if* there's a
significant saving to be had from it, but what you're showing us here
suggests that there isn't :(
Note also that on arm64 systems, commit 65688d2a05de ("arm64: cache:
Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)") queued in linux-next is
already likely to cut everything in half here.
The other reason i want to free rcache iovas in suitable place is the
IOVA longterm issue[0] (which is unrecoverable), if freeing cached iovas
when rmmod the driver or
adding a debugfs to do that, i think we can recover it by rmmod the
driver and then insmod it or calling the debugfs drop_rcache though it
doesn't solve the issue.
Or perhaps we could, y'know, actually solve that problem, rather than
invent hacks to bodge around it. I've lost track of where we got to with
that thread (sorry!), but I think it still needs someone to look into
the relative timing of all the flush queue interactions if we're to
really understand what's going on, and if a simple obvious fix doesn't
fall out of that then it probably makes sense to convert the rcache
depot to a garbage-collected list (per the original concept).
Much as I think that purging the rcaches automatically to paper over
that issue isn't a great idea, making the user do it manually is
certainly even worse.
Robin.
[0]
https://lore.kernel.org/linux-iommu/[email protected]/
However it seems rather unlikely in practice to have every possible
cache entry of every size used, so if saving smaller amounts of memory
is a concern wouldn't you also want to explicitly drain the flush
queues (16KB per CPU) and maybe look at trying to reclaim the unused
pagetable pages from the domain itself - that ~192KB worth of cached
IOVAs represents ~32K pages worth of IOVA space, which for an
implementation like io-pgtable-arm with the 4KB granule means ~256KB
worth of non-leaf pagetables left behind.
Ok, we may consider about those.
I'm not against the idea of having a mechanism to "compact" an idle
DMA domain if there are convincing numbers to back it up, but the
actual implementation would need to be better than this as well -
having the IOMMU core code poking directly into the internals of the
iommu-dma abstraction is not the way to go, and exporting anything to
modules, which the IOMU core is not, smells suspicious.
Robin.
Xiang Chen (4):
iommu/iova: add a function to free all rcached iovas and export it
iommu/iova: use function free_rcache_cached_iovas() to free all
rcached iovas
dma-iommu: add a interface to get iova_domain from iommu domain
iommu: free cached iovas when rmmod the driver of the last device in
the group
drivers/iommu/dma-iommu.c | 7 +++++++
drivers/iommu/iommu.c | 7 +++++++
drivers/iommu/iova.c | 17 ++++++++++++-----
include/linux/dma-iommu.h | 6 ++++++
include/linux/iova.h | 5 +++++
5 files changed, 37 insertions(+), 5 deletions(-)
_______________________________________________
Linuxarm mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu