On 12/4/18 5:57 PM, John Hubbard wrote:
> On 12/4/18 5:44 PM, Jerome Glisse wrote:
>> On Tue, Dec 04, 2018 at 05:15:19PM -0800, Matthew Wilcox wrote:
>>> On Tue, Dec 04, 2018 at 04:58:01PM -0800, John Hubbard wrote:
>>>> On 12/4/18 3:03 PM, Dan Williams wrote:
>>>>> Except the LRU fields are already in use for ZONE_DEVICE pages... how
>>>>> does this proposal interact with those?
>>>>
>>>> Very badly: page->pgmap and page->hmm_data both get corrupted. Is there an 
>>>> entire
>>>> use case I'm missing: calling get_user_pages() on ZONE_DEVICE pages? Said 
>>>> another
>>>> way: is it reasonable to disallow calling get_user_pages() on ZONE_DEVICE 
>>>> pages?
>>>>
>>>> If we have to support get_user_pages() on ZONE_DEVICE pages, then the 
>>>> whole 
>>>> LRU field approach is unusable.
>>>
>>> We just need to rearrange ZONE_DEVICE pages.  Please excuse the whitespace
>>> damage:
>>>
>>> +++ b/include/linux/mm_types.h
>>> @@ -151,10 +151,12 @@ struct page {
>>>  #endif
>>>                 };
>>>                 struct {        /* ZONE_DEVICE pages */
>>> +                       unsigned long _zd_pad_2;        /* LRU */
>>> +                       unsigned long _zd_pad_3;        /* LRU */
>>> +                       unsigned long _zd_pad_1;        /* uses mapping */
>>>                         /** @pgmap: Points to the hosting device page map. 
>>> */
>>>                         struct dev_pagemap *pgmap;
>>>                         unsigned long hmm_data;
>>> -                       unsigned long _zd_pad_1;        /* uses mapping */
>>>                 };
>>>  
>>>                 /** @rcu_head: You can use this to free a page by RCU. */
>>>
>>> You don't use page->private or page->index, do you Dan?
>>
>> page->private and page->index are use by HMM DEVICE page.
>>
> 
> OK, so for the ZONE_DEVICE + HMM case, that leaves just one field remaining 
> for 
> dma-pinned information. Which might work. To recap, we need:
> 
> -- 1 bit for PageDmaPinned
> -- 1 bit, if using LRU field(s), for PageDmaPinnedWasLru.
> -- N bits for a reference count
> 
> Those *could* be packed into a single 64-bit field, if really necessary.
> 

...actually, this needs to work on 32-bit systems, as well. And HMM is using a 
lot.
However, it is still possible for this to work.

Matthew, can I have that bit now please? I'm about out of options, and now it 
will actually
solve the problem here.

Given:

1) It's cheap to know if a page is ZONE_DEVICE, and ZONE_DEVICE means not on 
the LRU.
That, in turn, means only 1 bit instead of 2 bits (in addition to a counter) is 
required, 
for that case. 

2) There is an independent bit available (according to Matthew). 

3) HMM uses 4 of the 5 struct page fields, so only one field is available for a 
counter 
   in that case.

4) get_user_pages() must work on ZONE_DEVICE and HMM pages.

5) For a proper atomic counter for both 32- and 64-bit, we really do need a 
complete
unsigned long field.

So that leads to the following approach:

-- Use a single unsigned long field for an atomic reference count for the DMA 
pinned count.
For normal pages, this will be the *second* field of the LRU (in order to avoid 
PageTail bit).

For ZONE_DEVICE pages, we can also line up the fields so that the second LRU 
field is 
available and reserved for this DMA pinned count. Basically _zd_pad_1 gets move 
up and
optionally renamed:

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 017ab82e36ca..b5dcd9398cae 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -90,8 +90,8 @@ struct page {
                                 * are in use.
                                 */
                                struct {
-                                       unsigned long dma_pinned_flags;
-                                       atomic_t      dma_pinned_count;
+                                       unsigned long dma_pinned_flags; /* 
LRU.next */
+                                       atomic_t      dma_pinned_count; /* 
LRU.prev */
                                };
                        };
                        /* See page-flags.h for PAGE_MAPPING_FLAGS */
@@ -161,9 +161,9 @@ struct page {
                };
                struct {        /* ZONE_DEVICE pages */
                        /** @pgmap: Points to the hosting device page map. */
-                       struct dev_pagemap *pgmap;
-                       unsigned long hmm_data;
-                       unsigned long _zd_pad_1;        /* uses mapping */
+                       struct dev_pagemap *pgmap;      /* LRU.next */
+                       unsigned long _zd_pad_1;        /* LRU.prev or 
dma_pinned_count */
+                       unsigned long hmm_data;         /* uses mapping */
                };
 
                /** @rcu_head: You can use this to free a page by RCU. */



-- Use an additional, fully independent page bit (from Matthew) for 
PageDmaPinned.


thanks,
-- 
John Hubbard
NVIDIA

Reply via email to