Hi,

>>
>> Yeah but that is the requirement the HW has.
>>
>> I mean we can keep torturing the buddy allocator to give us 2M pages,
>> but essentially we want to get away from those specialized solutions
>> and has more of the functionality necessary to driver the HW in the
>> common Linux memory management code because that prevents vendors
>> from re-implementing that stuff in their specific driver over and
>> over again.
> 
> For the code at hand, if we insert an order 10 folio shmem will split
> it at writeout time but spit out a warning (if enabled) at the same
> time. For this particular use-case, I think it might make sense for the
> drivers that use direct insertion to cap the page-allocator orders to
> THP size (2M).

I think this just points at the bigger problem: shmem should be allocating
folios, not someone else on shmem's behalf.

> 
>>
>> Regards,
>> Christian.
>>
>>> c) You pass folio + order, which is just the red flag that you are
>>> doing
>>>    something extremely dodgy.
>>>
>>>    You just cast something that is not a folio, and was not
>>> allocated to be a
>>>    folio to a folio through page_folio(page). That will stop
>>> working completely
>>>    in the future once we decouple struct page from struct folio.
>>>
>>>    If it's not a folio with a proper set order, you should be
>>> passing page +
>>>    order.
>>>
>>> d) We are once more open-coding creation of a folio, by hand-
>>> crafting it
>>>    ourselves.
>>>
>>>    We have folio_alloc() and friends for a reason. Where we, for
>>> example, do a
>>>    page_rmappable_folio().
>>>
>>>    I am pretty sure that you are missing a call to
>>> page_rmappable_folio(),
>>>    resulting in the large folios not getting
>>> folio_set_large_rmappable() set.
>>>
>>> e) undo_compound_page(). No words.
>>>
>>>
>>>
>>> *maybe* it would be a little less bad if you would just allocate a
>>> compound page
>>> in your driver and use page_rmappable_folio() in there.
> 
> OK, yes it sounds like a prereq for this is that the driver actually
> allocates compound pages. It might be that the TTM comment about *not*
> doing that is stale, but need to check.
> 
> Would it be acceptable to export a function from core mm to split an
> isolated folio?

The point is: an allocated page, including an allocated compound page, is
logically not a folio. We have work going on to decouple both concepts 
completely.

We do have functions to split folios. But it should be given a proper folio, not
something that can currently be cast to a folio.

> 
>>>
>>> That wouldn't change a) or b), though.
>>>
>>>
>>>
>>> Good question.
>>> We'd have to keep swapoff and all of that working. For example, in
>>> try_to_unuse(), we special-case shmem_unuse() to handle non-
>>> anonymous pages.
>>>
>>> But then, the whole swapcache operates on folios ... so I am not
>>> sure if there
>>> is a lot to be won by re-implementing what shmem already does?
>>>
> 
> Still that would alleviate a) and b), right? At least as long as we
> keep folio sizes within the swap cache limits?

Let's hear from Christian what would be required for DRM to use shmem natively.
Maybe there would be a possible solution to have a custom shmem-like intnal
thing that can better deal with large folios.

-- 
Cheers,

David

Reply via email to