Re: [RFC][PATCH] Fix superpage unmap on Intel IOMMU

David Woodhouse Sat, 04 Jun 2011 02:00:39 -0700

On Fri, 3 Jun 2011, Alex Williamson wrote:

> On Fri, 2011-06-03 at 20:31 +0100, David Woodhouse wrote:
> > Tell me it isn't so...
> 
> Looks accurate to me, in fact, with hugetlbfs it seems like it's doing
> exactly what it should do.  The non-hugetlbfs case isn't efficient, but
> it isn't wrong either.  Our only other option is to figure out what's
> contiguous, map those chunks and try to keep track of all that.


That's not hard. You're already iterating over the pages. Instead of 
unconditionally mapping one page at a time, you have simple logic in the 
loop to do something like:
        if (this_hpa == first_hpa + nr_pages * page_size)
                nr_pages++;
        else {
                iommu_map(first_hpa, nr_pages);
                first_hpa = this_hpa;
                nr_pages = 1;
        }

This gives you a fairly simple way to spot contiguous ranges, avoid *some* 
of the extra cache and IOTLB flushes, and perhaps give us a chance to 
*opportunistically* use superpages. (You could avoid all the extra flushes 
if you give us a sglist, but that's probably more hassle than it's worth.)

Of course, if we do opportunistically use superpages, we're going to have 
to be able to break them. Currently your API just says "unmap whatever 
page size you happen to find in the PTE here, and tell me what it was", 
which will hurt if you really only mean to unmap 4KiB for ballooning, when 
we'd spotted that we could map a whole 2MiB page there.

(Unless the IOMMU keeps track *separately* of the page size used when 
mapping each range, somewhere outside the page tables themselves. But no. 
You *have* that information, in kvm_host_page_size(vma). So give it.)

-- 
dwmw2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC][PATCH] Fix superpage unmap on Intel IOMMU

Reply via email to