Hi, >> >> Yeah but that is the requirement the HW has. >> >> I mean we can keep torturing the buddy allocator to give us 2M pages, >> but essentially we want to get away from those specialized solutions >> and has more of the functionality necessary to driver the HW in the >> common Linux memory management code because that prevents vendors >> from re-implementing that stuff in their specific driver over and >> over again. > > For the code at hand, if we insert an order 10 folio shmem will split > it at writeout time but spit out a warning (if enabled) at the same > time. For this particular use-case, I think it might make sense for the > drivers that use direct insertion to cap the page-allocator orders to > THP size (2M).
I think this just points at the bigger problem: shmem should be allocating folios, not someone else on shmem's behalf. > >> >> Regards, >> Christian. >> >>> c) You pass folio + order, which is just the red flag that you are >>> doing >>> something extremely dodgy. >>> >>> You just cast something that is not a folio, and was not >>> allocated to be a >>> folio to a folio through page_folio(page). That will stop >>> working completely >>> in the future once we decouple struct page from struct folio. >>> >>> If it's not a folio with a proper set order, you should be >>> passing page + >>> order. >>> >>> d) We are once more open-coding creation of a folio, by hand- >>> crafting it >>> ourselves. >>> >>> We have folio_alloc() and friends for a reason. Where we, for >>> example, do a >>> page_rmappable_folio(). >>> >>> I am pretty sure that you are missing a call to >>> page_rmappable_folio(), >>> resulting in the large folios not getting >>> folio_set_large_rmappable() set. >>> >>> e) undo_compound_page(). No words. >>> >>> >>> >>> *maybe* it would be a little less bad if you would just allocate a >>> compound page >>> in your driver and use page_rmappable_folio() in there. > > OK, yes it sounds like a prereq for this is that the driver actually > allocates compound pages. It might be that the TTM comment about *not* > doing that is stale, but need to check. > > Would it be acceptable to export a function from core mm to split an > isolated folio? The point is: an allocated page, including an allocated compound page, is logically not a folio. We have work going on to decouple both concepts completely. We do have functions to split folios. But it should be given a proper folio, not something that can currently be cast to a folio. > >>> >>> That wouldn't change a) or b), though. >>> >>> >>> >>> Good question. >>> We'd have to keep swapoff and all of that working. For example, in >>> try_to_unuse(), we special-case shmem_unuse() to handle non- >>> anonymous pages. >>> >>> But then, the whole swapcache operates on folios ... so I am not >>> sure if there >>> is a lot to be won by re-implementing what shmem already does? >>> > > Still that would alleviate a) and b), right? At least as long as we > keep folio sizes within the swap cache limits? Let's hear from Christian what would be required for DRM to use shmem natively. Maybe there would be a possible solution to have a custom shmem-like intnal thing that can better deal with large folios. -- Cheers, David
