On Tue, 25 Mar 2025, Olivier Certner wrote:

The branch main has been updated by olce:

URL: 
https://cgit.FreeBSD.org/src/commit/?id=718d1928f8748fe4429c011296f94f194d63c695

commit 718d1928f8748fe4429c011296f94f194d63c695
Author:     Mathieu <sig...@gmail.com>
AuthorDate: 2024-11-14 00:24:02 +0000
Commit:     Olivier Certner <o...@freebsd.org>
CommitDate: 2025-03-25 08:41:44 +0000

   LinuxKPI: make linux_alloc_pages() honor __GFP_NORETRY

   This is to fix slowdowns with drm-kmod that get worse over time as
   physical memory become more fragmented (and probably also depending on
   other factors).

   Based on information posted in this bug report:
   https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277476

   By default, linux_alloc_pages() retries failed allocations by calling
   vm_page_reclaim_contig() to attempt to free contiguous physical memory
   pages. vm_page_reclaim_contig() does not always succeed and calling it
   can be very slow even when it fails. When physical memory is very
   fragmented, vm_page_reclaim_contig() can end up being called (and
   failing) after every allocation attempt. This could cause very
   noticeable graphical desktop hangs (which could last seconds).

   The drm-kmod code in question attempts to allocate multiple contiguous
   pages at once but does not actually require them to be contiguous. It
   can fallback to doing multiple smaller allocations when larger
   allocations fail. It passes alloc_pages() the __GFP_NORETRY flag in this
   case.

What is the drm code in question?  ttm_pool_alloc -> ttm_pool_alloc_page()?
As all other uses of __GFP_NORETRY in 6.1 (ignoring drm_printf.c) seem to be
in i915.


   This patch makes linux_alloc_pages() fail early (without retrying) when
   this flag is passed.

   [olce: The problem this patch fixes is longer and longer GUI freezes as
   a machine's memory gets filled and becomes fragmented, when using amdgpu
   from DRM kmod 5.15 and DRM kmod 6.1 (DRM kmod 5.10 is unaffected; newer
   Linux kernel introduced an "optimization" by which a pool of pages is
   filled preferentially with contiguous pages, which triggered the problem
   for us).  The original commit message above evokes freezes lasting
   seconds, but I occasionally witnessed some lasting tens of minutes,
   rendering a machine completely useless.

   The patch has been reviewed for its potential impacts to other LinuxKPI
   parts and our existing DRM kmods' code.  In particular, there is no
   other user of __GFP_NORETRY/GFP_NORETRY with Linux's alloc_pages*()
   functions in our tree or DRM kmod ports.

Are you sure?

i915_gem_object_get_pages_internal() in drm-6.1 at least seems to
conditionally pass it down:

     17 #define QUIET (__GFP_NORETRY | __GFP_NOWARN)
     ...
     74                         page = alloc_pages(gfp | (order ? QUIET : 
MAYFAIL),

Seems it can deal with allocation failures, lowering order or returning
-ENOMEM from the function so should be fine hopefully.



   It has also been tested extensively, by me for months against 14-STABLE
   and sporadically on -CURRENT on a RX580, and by several others as
   reported below and as is visible in more details in the quoted bugzilla
   PR and in the initial drm-kmod issue at
   https://github.com/freebsd/drm-kmod/issues/302, on a variety of other
   AMD GPUs (several RX580, RX570, Radeon Pro WX5100, Green Sardine 5600G,
   Ryzen 9 4900H with embedded Renoir).]

   PR:             277476
   Reported by:    Josef 'Jeff' Sipek <jef...@josefsipek.net>
   Reviewed by:    olce
   Tested by:      many (olce, Pierre Pronchery, Evgenii Khramtsov, chaplina, 
rk)
   MFC after:      2 weeks
   Relnotes:       yes
   Sponsored by:   The FreeBSD Foundation (review and part of testing)
---
sys/compat/linuxkpi/common/include/linux/gfp.h | 4 ++--
sys/compat/linuxkpi/common/src/linux_page.c    | 3 ++-
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/sys/compat/linuxkpi/common/include/linux/gfp.h 
b/sys/compat/linuxkpi/common/include/linux/gfp.h
index bd8fa1a18372..35dbe3e2a436 100644
--- a/sys/compat/linuxkpi/common/include/linux/gfp.h
+++ b/sys/compat/linuxkpi/common/include/linux/gfp.h
@@ -43,7 +43,6 @@
#define __GFP_NOWARN    0
#define __GFP_HIGHMEM   0
#define __GFP_ZERO      M_ZERO
-#define        __GFP_NORETRY   0
#define __GFP_NOMEMALLOC 0
#define __GFP_RECLAIM   0
#define __GFP_RECLAIMABLE   0
@@ -57,7 +56,8 @@
#define __GFP_KSWAPD_RECLAIM    0
#define __GFP_WAIT      M_WAITOK
#define __GFP_DMA32     (1U << 24) /* LinuxKPI only */
-#define        __GFP_BITS_SHIFT 25
+#define        __GFP_NORETRY   (1U << 25) /* LinuxKPI only */
+#define        __GFP_BITS_SHIFT 26
#define __GFP_BITS_MASK ((1 << __GFP_BITS_SHIFT) - 1)
#define __GFP_NOFAIL    M_WAITOK

diff --git a/sys/compat/linuxkpi/common/src/linux_page.c 
b/sys/compat/linuxkpi/common/src/linux_page.c
index bece8c954bfd..b5a0d34a6ad7 100644
--- a/sys/compat/linuxkpi/common/src/linux_page.c
+++ b/sys/compat/linuxkpi/common/src/linux_page.c
@@ -117,7 +117,8 @@ linux_alloc_pages(gfp_t flags, unsigned int order)
                        page = vm_page_alloc_noobj_contig(req, npages, 0, pmax,
                            PAGE_SIZE, 0, VM_MEMATTR_DEFAULT);
                        if (page == NULL) {
-                               if (flags & M_WAITOK) {
+                               if ((flags & (M_WAITOK | __GFP_NORETRY)) ==
+                                   M_WAITOK) {
                                        int err = vm_page_reclaim_contig(req,
                                            npages, 0, pmax, PAGE_SIZE, 0);
                                        if (err == ENOMEM)


--
Bjoern A. Zeeb                                                     r15:7

Reply via email to