On Mon, 4 May 2026 at 23:23, Marco Elver <[email protected]> wrote:
>
> On Thu, Apr 30, 2026 at 03:03PM +0200, Vlastimil Babka (SUSE) wrote:
> > On 4/24/26 15:24, Marco Elver wrote:
> >
> > > @@ -938,7 +968,7 @@ void *__kmalloc_large_node_noprof(size_t size, gfp_t
> > > flags, int node)
> > > * Try really hard to succeed the allocation but fail
> > > * eventually.
> > > */
> > > -static __always_inline __alloc_size(1) void *kmalloc_noprof(size_t size,
> > > gfp_t flags)
> > > +static __always_inline __alloc_size(1) void *_kmalloc_noprof(size_t
> > > size, gfp_t flags, kmalloc_token_t token)
> > > {
> > > if (__builtin_constant_p(size) && size) {
> > > unsigned int index;
> > > @@ -948,14 +978,16 @@ static __always_inline __alloc_size(1) void
> > > *kmalloc_noprof(size_t size, gfp_t f
> > >
> > > index = kmalloc_index(size);
> > > return __kmalloc_cache_noprof(
> > > - kmalloc_caches[kmalloc_type(flags,
> > > _RET_IP_)][index],
> > > + kmalloc_caches[kmalloc_type(flags,
> > > token)][index],
> >
> > While reviewing this, it occured to me we might have been using _RET_IP_
> > here in a suboptimal way ever since this was introduced. Since this is all
> > inlined, shouldn't have we been using _THIS_IP_ to really randomize using
> > the kmalloc() callsite, and not its parent?
> >
> > And after this patch, we get the token passed to _kmalloc_noprof()...
> >
> > > flags, size);
> > > }
> > > - return __kmalloc_noprof(size, flags);
> > > + return __kmalloc_noprof(PASS_KMALLOC_PARAMS(size, NULL, token),
> > > flags);
> >
> > ... and used also here for the non-constant-size, where previously
> > __kmalloc_noprof() (not inline function) would correctly use _RET_IP_ on its
> > own ...
> >
> > > }
> > > +#define kmalloc_noprof(...)
> > > _kmalloc_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
> >
> > ... and the token comes from here. With random partitioning that's
> > #define __kmalloc_token(...) ((kmalloc_token_t){ .v = _RET_IP_ })
> >
> > so that AFAIK makes the situation worse as now the cases without constant
> > size also start randomizing by the parent callsite and not the kmalloc
> > callsite.
> >
> > But there are many users of __kmalloc_token() and maybe some are corrent in
> > using _RET_IP_, I haven't checked, maybe we'll need two variants, or further
> > change things around.
>
> Good catch. I don't think we need multiple variants (otherwise the TYPED
> variant would be broken) - we're moving token generation to the callers
> (not even inlined anymore) with all this macro magic.
>
> I think this is all we need:
>
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -503,7 +503,7 @@ int kmem_cache_shrink(struct kmem_cache *s);
> typedef struct { unsigned long v; } kmalloc_token_t;
> #ifdef CONFIG_KMALLOC_PARTITION_RANDOM
> extern unsigned long random_kmalloc_seed;
> -#define __kmalloc_token(...) ((kmalloc_token_t){ .v = _RET_IP_ })
> +#define __kmalloc_token(...) ((kmalloc_token_t){ .v = _THIS_IP_ })
> #elif defined(CONFIG_KMALLOC_PARTITION_TYPED)
> #define __kmalloc_token(...) ((kmalloc_token_t){ .v =
> __builtin_infer_alloc_token(__VA_ARGS__) })
> #endif
>
> Plus a paragraph in the commit message. Let me add that.
Bah, this is why it doesn't work:
>> drivers/gpu/drm/msm/msm_gpu.c:272:4: error: cannot jump from this indirect
>> goto statement to one of its possible targets
272 | drm_exec_retry_on_contention(&exec);
| ^
include/drm/drm_exec.h:123:4: note: expanded from macro
'drm_exec_retry_on_contention'
123 | goto *__drm_exec_retry_ptr; \
| ^
drivers/gpu/drm/msm/msm_gpu.c:304:16: note: possible target of
indirect goto statement
304 | state->bos = kcalloc(submit->nr_bos,
| ^
include/linux/slab.h:1173:34: note: expanded from macro 'kcalloc'
1173 | #define kcalloc(n, size, flags) kmalloc_array(n,
size, (flags) | __GFP_ZERO)
| ^
include/linux/slab.h:1133:42: note: expanded from macro 'kmalloc_array'
1133 | #define kmalloc_array(...)
alloc_hooks(kmalloc_array_noprof(__VA_ARGS__))
| ^
include/linux/slab.h:1132:71: note: expanded from macro
'kmalloc_array_noprof'
1132 | #define kmalloc_array_noprof(...)
_kmalloc_array_noprof(__VA_ARGS__, __kmalloc_token(__VA_ARGS__))
|
^
include/linux/slab.h:506:55: note: expanded from macro '__kmalloc_token'
506 | #define __kmalloc_token(...) ((kmalloc_token_t){ .v = _THIS_IP_ })
| ^
include/linux/instruction_pointer.h:10:41: note: expanded from
macro '_THIS_IP_'
10 | #define _THIS_IP_ ({ __label__ __here; __here: (unsigned
long)&&__here; })
| ^
drivers/gpu/drm/msm/msm_gpu.c:304:16: note: jump enters a statement
expression
Apparently using _THIS_IP_ creates a possible indirect jump target,
but because it's in a statement expression, it's invalid, so the
compiler complains. This is obviously nonsense, because the actual
indirect jump in this gpu driver code would never jump to the
_THIS_IP_ __here label, but that's what it is.
Given this pre-existing issue, we probably need to continue using
_RET_IP_, as before. I tried to fix _THIS_IP_, but it's incredibly
brittle (e.g. __always_inline function returning address of label
doesn't work on Clang, but would on GCC).