On Tue, 25 Mar 2025, Olivier Certner wrote:
Hi Olivier,
What is the drm code in question? ttm_pool_alloc -> ttm_pool_alloc_page()?
As all other uses of __GFP_NORETRY in 6.1 (ignoring drm_printf.c) seem to be
in i915.
Yes, this is indeed targeted at ttm_pool_alloc_page() which tries to allocate
contiguous pages to fill up the pool (but not in 5.10). TTM pools are used by
the amdgpu driver to build its translation tables.
Great; I think with two other work in progress bits the disabled
fallback in there could be used again very soon but that'll be a
discussion for another time and place (and likely people).
Calls to functions other than (linux_)alloc_pages*() are unaffected by the
change, and if you dig through all the references of __GFP_NORETRY/GFP_RETRY
(including those from files under 'selftests/'), you'll see the built GFP flags
are never used with (linux_)alloc_pages*(), except for the only reference you
mentioned.
Are you sure?
i915_gem_object_get_pages_internal() in drm-6.1 at least seems to
conditionally pass it down:
17 #define QUIET (__GFP_NORETRY | __GFP_NOWARN)
...
74 page = alloc_pages(gfp | (order ? QUIET :
MAYFAIL),
Seems it can deal with allocation failures, lowering order or returning
-ENOMEM from the function so should be fine hopefully.
Yes, I was aware of this piece of code, but obviously it cannot cause any
problem.
All calls to Linux's alloc_pages*() can fail *whatever* the passed GFP flags
except for GFP_NOFAIL (and that's the only exception).
Ah I see; that's where the M_WAITOK logic in there comes from. I was
wondering about it given others like GFP_KERNEL also being mapped to
that. Sometimes that makes me wonder if we should indeed pass the gfp_t
all the way down to the actual function doing the alloc and only there
filter and convert rather than doing that earlier in wrapper functions.
Callers always have to cope, and specifically when specifying __GFP_NORETRY it
would be foolish not too (and that wouldn't be allowed in Linus' tree anyway).
If it wasn't for that, i915_gem_object_get_pages_internal() does the same
lowering that ttm_pool_alloc_page() does anyway, as you noticed.
My sentence was indeed too strong, as I was still swapping in context for this
work which was done months ago now.
I know how that feels. I just started looking at something I've done a
year+ ago completely outside my domain.
I reviewed all callers not only for GFP_NORETRY but also for most others GFP
flags (I have tweaked grep files for all of them and over multiple Linux
versions), as I started some work to document what the Linux
guarantees/behaviors really are and then some other work to rationalize how we
translate them in FreeBSD (there seems to be several possible improvements
here). Unfortunately, I have stalled that last work for weeks now, and
probably will for a significant while.
Given Linux's contract on __GFP_NORETRY, it is arguably not reasonable to spend
time compacting memory on such calls, that's a deviation from what drivers are
supposed to expect.
Oh, and the rest of the commit message also doesn't mention that I also tested
this change on machines using the i915 driver, without observing any problem or
change in behavior.
Fantastic!
Thanks you so much for the follow-up. It clarifies a lot and makes sense and
is very helpful!
Lots of joy,
Bjoern
--
Bjoern A. Zeeb r15:7