On 11/17/21 10:43, Christoph Hellwig wrote:
> On Fri, Nov 12, 2021 at 04:08:24PM +0100, Joao Martins wrote:
>> Use the newly added compound devmap facility which maps the assigned dax
>> ranges as compound pages at a page size of @align.
>>
>> dax devices are created with a fixed @align (huge page size) which is
>> enforced through as well at mmap() of the device. Faults, consequently
>> happen too at the specified @align specified at the creation, and those
>> don't change throughout dax device lifetime. MCEs unmap a whole dax
>> huge page, as well as splits occurring at the configured page size.
>>
>> Performance measured by gup_test improves considerably for
>> unpin_user_pages() and altmap with NVDIMMs:
>>
>> $ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w
>> (pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms
>> [altmap]
>> (pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms
>> put:~71ms
>>
>> $ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w
>> (pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms
>> [altmap with -m 127004]
>> (pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec
>> put:~563ms
>>
>> .. as well as unpin_user_page_range_dirty_lock() being just as effective
>> as THP/hugetlb[0] pages.
>>
>> [0]
>> https://lore.kernel.org/linux-mm/[email protected]/
>>
>> Signed-off-by: Joao Martins <[email protected]>
>> Reviewed-by: Dan Williams <[email protected]>
>> ---
>> drivers/dax/device.c | 57 ++++++++++++++++++++++++++++++++++----------
>> 1 file changed, 44 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
>> index a65c67ab5ee0..0c2ac97d397d 100644
>> --- a/drivers/dax/device.c
>> +++ b/drivers/dax/device.c
>> @@ -192,6 +192,42 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax
>> *dev_dax,
>> }
>> #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
>>
>> +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn,
>> + unsigned long fault_size,
>> + struct address_space *f_mapping)
>> +{
>> + unsigned long i;
>> + pgoff_t pgoff;
>> +
>> + pgoff = linear_page_index(vmf->vma, ALIGN(vmf->address, fault_size));
>> +
>> + for (i = 0; i < fault_size / PAGE_SIZE; i++) {
>> + struct page *page;
>> +
>> + page = pfn_to_page(pfn_t_to_pfn(pfn) + i);
>> + if (page->mapping)
>> + continue;
>> + page->mapping = f_mapping;
>> + page->index = pgoff + i;
>> + }
>> +}
>
> No need to pass f_mapping here, it must be vmf->vma->vm_file->f_mapping.
>
Hmmm good point -- If I keep this structure yeah I will nuke @f_mapping.
Should I move the @mapping setting to before vmf_insert_pfn*() (as Jason
suggests)
then the @f_mapping argument might be useful to clear it on @rc !=
VM_FAULT_NOPAGE.
>> +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn,
>> + unsigned long fault_size,
>> + struct address_space *f_mapping)
>> +{
>> + struct page *head;
>> +
>> + head = pfn_to_page(pfn_t_to_pfn(pfn));
>> + head = compound_head(head);
>> + if (head->mapping)
>> + return;
>> +
>> + head->mapping = f_mapping;
>> + head->index = linear_page_index(vmf->vma,
>> + ALIGN(vmf->address, fault_size));
>> +}
>
> Same here.
>
/me nods
>> if (rc == VM_FAULT_NOPAGE) {
>> - unsigned long i;
>> - pgoff_t pgoff;
>> + struct dev_pagemap *pgmap = dev_dax->pgmap;
>
> If you're touching this anyway: why not do an early return here for
> the error case to simplify the flow?
>
Yeah. I was thinking in doing that i.e. calling set_compound_mapping() earlier
or even inside set_page_mapping(). Albeit this would need a new argument (the
dev_dax).