On Tue, Mar 22, 2022 at 09:29:23AM -0600, Alex Williamson wrote:

> I'm still picking my way through the series, but the later compat
> interface doesn't mention this difference as an outstanding issue.
> Doesn't this difference need to be accounted in how libvirt manages VM
> resource limits?  

AFACIT, no, but it should be checked.

> AIUI libvirt uses some form of prlimit(2) to set process locked
> memory limits.

Yes, and ulimit does work fully. prlimit adjusts the value:

int do_prlimit(struct task_struct *tsk, unsigned int resource,
                struct rlimit *new_rlim, struct rlimit *old_rlim)
{
        rlim = tsk->signal->rlim + resource;
[..]
                if (new_rlim)
                        *rlim = *new_rlim;

Which vfio reads back here:

drivers/vfio/vfio_iommu_type1.c:        unsigned long pfn, limit = 
rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
drivers/vfio/vfio_iommu_type1.c:        unsigned long limit = 
rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;

And iommufd does the same read back:

        lock_limit =
                task_rlimit(pages->source_task, RLIMIT_MEMLOCK) >> PAGE_SHIFT;
        npages = pages->npinned - pages->last_npinned;
        do {
                cur_pages = atomic_long_read(&pages->source_user->locked_vm);
                new_pages = cur_pages + npages;
                if (new_pages > lock_limit)
                        return -ENOMEM;
        } while (atomic_long_cmpxchg(&pages->source_user->locked_vm, cur_pages,
                                     new_pages) != cur_pages);

So it does work essentially the same.

The difference is more subtle, iouring/etc puts the charge in the user
so it is additive with things like iouring and additively spans all
the users processes.

However vfio is accounting only per-process and only for itself - no
other subsystem uses locked as the charge variable for DMA pins.

The user visible difference will be that a limit X that worked with
VFIO may start to fail after a kernel upgrade as the charge accounting
is now cross user and additive with things like iommufd.

This whole area is a bit peculiar (eg mlock itself works differently),
IMHO, but with most of the places doing pins voting to use
user->locked_vm as the charge it seems the right path in today's
kernel.

Ceratinly having qemu concurrently using three different subsystems
(vfio, rdma, iouring) issuing FOLL_LONGTERM and all accounting for
RLIMIT_MEMLOCK differently cannot be sane or correct.

I plan to fix RDMA like this as well so at least we can have
consistency within qemu.

Thanks,
Jason
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to