On Tue, Jan 20, 2026 at 05:00:28PM +0100, Arnd Bergmann wrote:
> On Tue, Jan 20, 2026, at 16:10, Lorenzo Stoakes wrote:
> > On Tue, Jan 20, 2026 at 09:36:19AM -0400, Jason Gunthorpe wrote:
> >
> > I am not sure about this 'idiomatic kernel style' thing either, it feels 
> > rather
> > conjured. Yes you wouldn't ordinarily pass something larger than a register 
> > size
> > by-value, but here the intent is for it to be inlined anyway right?
> >
> > It strikes me that the key optimisation here is the inlining, now if the 
> > issue
> > is that ye olde compiler might choose not to inline very small functions 
> > (seems
> > unlikely) we could always throw in an __always_inline?
>
> I can think of three specific things going wrong with structures passed
> by value:

I mean now you seem to be talking about it _in general_ which, _in theory_,
kills the whole concept of bitmap VMA flags _altogether_ really, or at
least any workable version of them.

But... no.

I'm not going to not do this because of perceived possible issues with ppc
and mips.

It's not reasonable to hold up a necessary change for the future of the
kernel IMO, and we can find workarounds as necessary should anything
problematic actually occur in practice.

I am happy to do so as maintainer of this work :)

>
> - functions that cannot be inlined are bound by the ELF ABI, and
>   several of them require structs to be passed on the stack regardless
>   of the size. Most of the popular architectures seem fine here, but
>   mips and powerpc look like they are affected.

I explicitly checked mips and it seemed fine, but not gone super deep.

>
> - The larger the struct is, the more architectures are affected.
>   Parts of the amdgpu driver and the bcachefs file system ran into this

bcachefs is not in the kernel. We don't care about out-of-tree stuff by
convention.

amdgpu is more concerning, but...

>   with 64-bit structures passed by value on 32-bit architectures
>   causing horrible codegen even with inlining. I think it's
>   usually fine up to a single register size.

...32-bit kernels are not ones where you would anticipate incredible
performance for one, for another if any significant issues arise we can
look at arch-specific workarounds.

I already have vma_flags_*_word*() helpers to do things 'the old way' in
the worst case. More can be added if and when anything arises.

Again, I don't think we should hold up the rest of the kernel (being able
to transition to not being arbitrarily limited by VMA count is very
important) on this basis.

Also I've checked 32-bit code generation which _seemed_ fine at a
glance. Of course again I've not good super deep on that.

>
> - clang's inlining algorithm works the other way round from gcc's:
>   inlining into the root caller first and sometimes leaving tiny
>   leaf function out of line unless you add __always_inline.

I already __always_inline all pertinent funcitons so hopefully that should
be no issue.

And for instance the assembly I shared earlier was built using clang, as I
now use clang for _all_ my builds locally.

>
>       Arnd

Thanks, Lorenzo

Reply via email to