On Tue, Mar 22, 2022 at 09:29:23AM -0600, Alex Williamson wrote:
> I'm still picking my way through the series, but the later compat
> interface doesn't mention this difference as an outstanding issue.
> Doesn't this difference need to be accounted in how libvirt manages VM
> resource limits?
AFACIT, no, but it should be checked.
> AIUI libvirt uses some form of prlimit(2) to set process locked
> memory limits.
Yes, and ulimit does work fully. prlimit adjusts the value:
int do_prlimit(struct task_struct *tsk, unsigned int resource,
struct rlimit *new_rlim, struct rlimit *old_rlim)
{
rlim = tsk->signal->rlim + resource;
[..]
if (new_rlim)
*rlim = *new_rlim;
Which vfio reads back here:
drivers/vfio/vfio_iommu_type1.c: unsigned long pfn, limit =
rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
drivers/vfio/vfio_iommu_type1.c: unsigned long limit =
rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
And iommufd does the same read back:
lock_limit =
task_rlimit(pages->source_task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
npages = pages->npinned - pages->last_npinned;
do {
cur_pages = atomic_long_read(&pages->source_user->locked_vm);
new_pages = cur_pages + npages;
if (new_pages > lock_limit)
return -ENOMEM;
} while (atomic_long_cmpxchg(&pages->source_user->locked_vm, cur_pages,
new_pages) != cur_pages);
So it does work essentially the same.
The difference is more subtle, iouring/etc puts the charge in the user
so it is additive with things like iouring and additively spans all
the users processes.
However vfio is accounting only per-process and only for itself - no
other subsystem uses locked as the charge variable for DMA pins.
The user visible difference will be that a limit X that worked with
VFIO may start to fail after a kernel upgrade as the charge accounting
is now cross user and additive with things like iommufd.
This whole area is a bit peculiar (eg mlock itself works differently),
IMHO, but with most of the places doing pins voting to use
user->locked_vm as the charge it seems the right path in today's
kernel.
Ceratinly having qemu concurrently using three different subsystems
(vfio, rdma, iouring) issuing FOLL_LONGTERM and all accounting for
RLIMIT_MEMLOCK differently cannot be sane or correct.
I plan to fix RDMA like this as well so at least we can have
consistency within qemu.
Thanks,
Jason
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu