* Dave Hansen <[email protected]> wrote:

> The FPU rewrite removed the dynamic allocations of 'struct fpu'.
> But, this potentially wastes massive amounts of memory (2k per
> task on systems that do not have AVX-512 for instance).
> 
> Instead of having a separate slab, this patch just appends the
> space that we need to the 'task_struct' which we dynamically
> allocate already.  This saves from doing an extra slab allocation
> at fork().  The only real downside here is that we have to stick
> everything and the end of the task_struct.  But, I think the
> BUILD_BUG_ON()s I stuck in there should keep that from being too
> fragile.
> 
> This survives a quick build and boot in a VM.  Does anyone see any
> real downsides to this?

So considering the complexity of the other patch that makes the static 
allocation, 
I'd massively prefer this patch as it solves the real bug.

It should also work on future hardware a lot better.

This was the dynamic approach I suggested in our discussion of the big FPU code 
rework.

> --- a/arch/x86/kernel/fpu/init.c~dynamically-allocate-struct-fpu      
> 2015-07-16 10:50:42.355571648 -0700
> +++ b/arch/x86/kernel/fpu/init.c      2015-07-16 12:02:15.284280976 -0700
> @@ -136,6 +136,45 @@ static void __init fpu__init_system_gene
>  unsigned int xstate_size;
>  EXPORT_SYMBOL_GPL(xstate_size);
>  
> +#define CHECK_MEMBER_AT_END_OF(TYPE, MEMBER) \
> +     BUILD_BUG_ON((sizeof(TYPE) -                    \
> +                     offsetof(TYPE, MEMBER) -        \
> +                     sizeof(((TYPE *)0)->MEMBER)) >  \
> +                     0)                              \
> +
> +/*
> + * We append the 'struct fpu' to the task_struct.
> + */
> +int __weak arch_task_struct_size(void)

This should not be __weak, otherwise we risk getting the generic version:

> --- a/kernel/fork.c~dynamically-allocate-struct-fpu   2015-07-16 
> 10:50:42.357571739 -0700
> +++ b/kernel/fork.c   2015-07-16 11:25:53.873498188 -0700
> @@ -287,15 +287,21 @@ static void set_max_threads(unsigned int
>       max_threads = clamp_t(u64, threads, MIN_THREADS, MAX_THREADS);
>  }
>  
> +int __weak arch_task_struct_size(void)
> +{
> +     return sizeof(struct task_struct);
> +}
> +

Your system probably worked due to link order preferring the x86 version but 
I'm 
not sure.

Other than this bug it looks good to me in principle.

Lemme check it on various hardware.

Thanks,

        Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to