On Sat Dec 6, 2025 at 5:22 AM JST, Timur Tabi wrote:
> On Fri, 2025-12-05 at 09:35 +0900, Alexandre Courbot wrote:
>> 
>> With one caveat: `new` now returns a 4K object on the stack, which we
>> definitely want to avoid. So maybe we can have a wrapper for things we
>> want to align the 4K:
>> 
>>   #[repr(C)]
>>   pub(crate) struct GspPageAligned<T> {
>>     pub(crate) inner: T,
>>     padding: [u8; GSP_PAGE_SIZE - core::mem::size_of::<T>()],
>>   }
>> 
>> We would then allocate the CoherentAllocation using a
>> `GspPageAligned<GspArgumentsCached>`, and initialize its useful data
>> with:
>> 
>>   dma_write!(rmargs[0].inner = fw::GspArgumentsCached::new(&cmdq))?;
>
> I'm confused.  Aren't we already avoiding the stack?  This is the code today:
>
>         let rmargs = CoherentAllocation::<GspArgumentsCached>::alloc_coherent(
>             dev,
>             1,
>             GFP_KERNEL | __GFP_ZERO,
>         )?;
>         dma_write!(rmargs[0] = fw::GspArgumentsCached::new(&cmdq))?;
>
> The only difference with what's there today vs what you suggest is the 
> ".inner", and I think I can
> avoid even that if I make GspPageAligned a tuple instead of a named struct.

`fw::GspArgumentsCached::new(&cmdq)` does return the value on the stack
before it is stored in the coherent allocation by `dma_write`.

>
>> 
>> > I had to remove the #[repr(transparent)].  Is that
>> > okay?  The code compiles and seems to work.
>> 
>> As long as the struct is `repr(C)`, the layout will be what we expect.
>> Actually this made me realize that `repr(C)` is technically what we want for 
>> our
>> bindings abstractions, not `repr(transparent)` - both happen to have the
>> same effect since the wrapper struct is `repr(C)` anyway, but the latter
>> is more restrictive than we need.
>> 
>> Glad we found an elegant way to address this!
>
> Actually, I think a more elegant solution would be a new variant of
> CoherentAllocation::alloc_coherent() that takes a size to allocate instead of 
> using size_of::<T>.  
>
> In fact, I wonder if it makes sense to always grow the size of the allocation 
> to the nearest page,
> since dma_alloc_attrs() always allocates whole pages anyway.  Perhaps 
> CoherentAllocation<T> needs an
> allocated() method in addition to a size() method.  size() returns 
> count*sizeof as today, and
> allocated() returns that value rounded up to the nearest PAGE_SIZE.

Technically coherent allocations are always page-aligned, but that's an
implementation detail. Callers should explicitly align if they have that
requirement.

Reply via email to