On 5/18/21 8:56 PM, Jane Chu wrote:
> On 5/18/2021 10:27 AM, Joao Martins wrote:
> 
>> On 5/5/21 11:36 PM, Joao Martins wrote:
>>> On 5/5/21 11:20 PM, Dan Williams wrote:
>>>> On Wed, May 5, 2021 at 12:50 PM Joao Martins <[email protected]> 
>>>> wrote:
>>>>> On 5/5/21 7:44 PM, Dan Williams wrote:
>>>>>> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins <[email protected]> 
>>>>>> wrote:
>>>>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>>>>>>> index b46f63dcaed3..bb28d82dda5e 100644
>>>>>>> --- a/include/linux/memremap.h
>>>>>>> +++ b/include/linux/memremap.h
>>>>>>> @@ -114,6 +114,7 @@ struct dev_pagemap {
>>>>>>>          struct completion done;
>>>>>>>          enum memory_type type;
>>>>>>>          unsigned int flags;
>>>>>>> +       unsigned long align;
>>>>>> I think this wants some kernel-doc above to indicate that non-zero
>>>>>> means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
>>>>>> means "use non-compound base pages".
>> [...]
>>
>>>>>> The non-zero value must be
>>>>>> PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE.
>>>>>> Hmm, maybe it should be an
>>>>>> enum:
>>>>>>
>>>>>> enum devmap_geometry {
>>>>>>      DEVMAP_PTE,
>>>>>>      DEVMAP_PMD,
>>>>>>      DEVMAP_PUD,
>>>>>> }
>>>>>>
>>>>> I suppose a converter between devmap_geometry and page_size would be 
>>>>> needed too? And maybe
>>>>> the whole dax/nvdimm align values change meanwhile (as a followup 
>>>>> improvement)?
>>>> I think it is ok for dax/nvdimm to continue to maintain their align
>>>> value because it should be ok to have 4MB align if the device really
>>>> wanted. However, when it goes to map that alignment with
>>>> memremap_pages() it can pick a mode. For example, it's already the
>>>> case that dax->align == 1GB is mapped with DEVMAP_PTE today, so
>>>> they're already separate concepts that can stay separate.
>>>>
>>> Gotcha.
>> I am reconsidering part of the above. In general, yes, the meaning of devmap 
>> @align
>> represents a slightly different variation of the device @align i.e. how the 
>> metadata is
>> laid out **but** regardless of what kind of page table entries we use 
>> vmemmap.
>>
>> By using DEVMAP_PTE/PMD/PUD we might end up 1) duplicating what nvdimm/dax 
>> already
>> validates in terms of allowed device @align values (i.e. PAGE_SIZE, PMD_SIZE 
>> and PUD_SIZE)
>> 2) the geometry of metadata is very much tied to the value we pick to @align 
>> at namespace
>> provisioning -- not the "align" we might use at mmap() perhaps that's what 
>> you referred
>> above? -- and 3) the value of geometry actually derives from dax device 
>> @align because we
>> will need to create compound pages representing a page size of @align value.
>>
>> Using your example above: you're saying that dax->align == 1G is mapped with 
>> DEVMAP_PTEs,
>> in reality the vmemmap is populated with PMDs/PUDs page tables (depending on 
>> what archs
>> decide to do at vmemmap_populate()) and uses base pages as its metadata 
>> regardless of what
>> device @align. In reality what we want to convey in @geometry is not page 
>> table sizes, but
>> just the page size used for the vmemmap of the dax device. Additionally, 
>> limiting its
>> value might not be desirable... if tomorrow Linux for some arch supports 
>> dax/nvdimm
>> devices with 4M align or 64K align, the value of @geometry will have to 
>> reflect the 4M to
>> create compound pages of order 10 for the said vmemmap.
>>
>> I am going to wait until you finish reviewing the remaining four patches of 
>> this series,
>> but maybe this is a simple misnomer (s/align/geometry/) with a comment but 
>> without
>> DEVMAP_{PTE,PMD,PUD} enum part? Or perhaps its own struct with a value and 
>> enum a
>> setter/getter to audit its value? Thoughts?
> 
> Good points there.
> 
> My understanding is that  dax->align  conveys granularity of size while 
> carving out a namespace it's a geometry attribute loosely akin to sector size 
> of a spindle 
> disk.  I tend to think that device pagesize  has almost no relation to 
> "align" in that, it's 
> possible to have 1G "align" and 4K pagesize, or verse versa.  That is, with 
> the advent of compound page 
> support, it is possible to totally separate the two concepts.
> 
> How about adding a new option to "ndctl create-namespace" that describes 
> device creator's desired pagesize, and another parameter to describe whether 
> the pagesize shall 
> be fixed or allowed to be split up, such that, if the intention is to never 
> split up 2M pagesize, then it 
> would be possible to save a lot metadata space on the device?

Maybe that can be selected by the driver too, but it's an interesting point you 
raise
should we settle with the geometry (e.g. like a geometry sysfs entry IIUC your
suggestion?). device-dax for example would use geometry == align and therefore 
save space
(like what I propose in patch 10). But fsdax would retain the default that is 
geometry =
PAGE_SIZE and align = PMD_SIZE should it want to split pages.

Interestingly, devmap poisoning always occur at @align level regardless of 
@geometry.

What I am not sure is what value (vs added complexity) it brings to allow 
geometry *value*
to be selecteable by user given that so far we seem to only ever initialize 
metadata as
either sets of base pages [*] or sets of compound pages (of a size). And the 
difference
between both can possibly be summarized to split-ability like you say.

[*] that optionally can are morphed into compound pages by driver
_______________________________________________
Linux-nvdimm mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to