On Thu, Oct 1, 2020 at 1:43 PM Chang S. Bae <[email protected]> wrote: > > Intel's Extended Feature Disable (XFD) feature is an extension of the XSAVE > architecture. XFD allows the kernel to enable a feature state in XCR0 and > to receive a #NM trap when a task uses instructions accessing that state. > In this way, Linux can allocate the large task->fpu buffer only for tasks > that use it. > > XFD introduces two MSRs: IA32_XFD to enable/disable the feature and > IA32_XFD_ERR to assist the #NM trap handler. Both use the same > state-component bitmap format, used by XCR0. > > Use this hardware capability to find the right time to expand xstate area. > Introduce two sets of helper functions for that: > > 1. The first set is primarily for interacting with the XFD hardware > feature. Helpers for configuring disablement, e.g. in context switching, > are: > xdisable_setbits() > xdisable_getbits() > xdisable_switch() > > 2. The second set is for managing the first-use status and handling #NM > trap: > xfirstuse_enabled() > xfirstuse_not_detected() > xfirstuse_event_handler() > > The #NM handler induces the xstate area expansion to save the first-used > states. > > No functional change until the kernel enables dynamic user states and XFD. > > Signed-off-by: Chang S. Bae <[email protected]> > Reviewed-by: Len Brown <[email protected]> > Cc: [email protected] > Cc: [email protected] > --- > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/fpu/internal.h | 53 ++++++++++++++++++++++++++++- > arch/x86/include/asm/msr-index.h | 2 ++ > arch/x86/kernel/fpu/core.c | 37 ++++++++++++++++++++ > arch/x86/kernel/fpu/xstate.c | 34 ++++++++++++++++-- > arch/x86/kernel/process.c | 5 +++ > arch/x86/kernel/process_32.c | 2 +- > arch/x86/kernel/process_64.c | 2 +- > arch/x86/kernel/traps.c | 3 ++ > 9 files changed, 133 insertions(+), 6 deletions(-) > > diff --git a/arch/x86/include/asm/cpufeatures.h > b/arch/x86/include/asm/cpufeatures.h > index 2901d5df4366..7d7fe1d82966 100644 > --- a/arch/x86/include/asm/cpufeatures.h > +++ b/arch/x86/include/asm/cpufeatures.h > @@ -274,6 +274,7 @@ > #define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC instruction */ > #define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 > instruction */ > #define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS > instructions */ > +#define X86_FEATURE_XFD (10*32+ 4) /* eXtended > Feature Disabling */ > > /* > * Extended auxiliary flags: Linux defined - for features scattered in > various > diff --git a/arch/x86/include/asm/fpu/internal.h > b/arch/x86/include/asm/fpu/internal.h > index 3b03ead87a46..f5dbbaa060fb 100644 > --- a/arch/x86/include/asm/fpu/internal.h > +++ b/arch/x86/include/asm/fpu/internal.h > @@ -572,11 +572,60 @@ static inline void switch_fpu_prepare(struct fpu > *old_fpu, int cpu) > * Misc helper functions: > */ > > +/* The first-use detection helpers: */ > + > +static inline void xdisable_setbits(u64 value) > +{ > + wrmsrl_safe(MSR_IA32_XFD, value); > +} > + > +static inline u64 xdisable_getbits(void) > +{ > + u64 value; > + > + rdmsrl_safe(MSR_IA32_XFD, &value); > + return value; > +} > + > +static inline u64 xfirstuse_enabled(void) > +{ > + /* All the dynamic user components are first-use enabled. */ > + return xfeatures_mask_user_dynamic; > +} > + > +/* > + * Convert fpu->firstuse_bv to xdisable configuration in MSR IA32_XFD. > + * xdisable_setbits() only uses this. > + */ > +static inline u64 xfirstuse_not_detected(struct fpu *fpu) > +{ > + u64 firstuse_bv = (fpu->state_mask & xfirstuse_enabled()); > + > + /* > + * If first-use is not detected, set the bit. If the detection is > + * not enabled, the bit is always zero in firstuse_bv. So, make > + * following conversion: > + */ > + return (xfirstuse_enabled() ^ firstuse_bv); > +} > + > +/* Update MSR IA32_XFD based on fpu->firstuse_bv */ > +static inline void xdisable_switch(struct fpu *prev, struct fpu *next) > +{ > + if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled()) > + return; > + > + if (unlikely(prev->state_mask != next->state_mask)) > + xdisable_setbits(xfirstuse_not_detected(next)); > +} > + > +bool xfirstuse_event_handler(struct fpu *fpu); > + > /* > * Load PKRU from the FPU context if available. Delay loading of the > * complete FPU state until the return to userland. > */ > -static inline void switch_fpu_finish(struct fpu *new_fpu) > +static inline void switch_fpu_finish(struct fpu *old_fpu, struct fpu > *new_fpu) > { > u32 pkru_val = init_pkru_value; > struct pkru_state *pk; > @@ -586,6 +635,8 @@ static inline void switch_fpu_finish(struct fpu *new_fpu) > > set_thread_flag(TIF_NEED_FPU_LOAD); > > + xdisable_switch(old_fpu, new_fpu); > + > if (!cpu_feature_enabled(X86_FEATURE_OSPKE)) > return; > > diff --git a/arch/x86/include/asm/msr-index.h > b/arch/x86/include/asm/msr-index.h > index 2859ee4f39a8..0ccbe8cc99ad 100644 > --- a/arch/x86/include/asm/msr-index.h > +++ b/arch/x86/include/asm/msr-index.h > @@ -610,6 +610,8 @@ > #define MSR_IA32_BNDCFGS_RSVD 0x00000ffc > > #define MSR_IA32_XSS 0x00000da0 > +#define MSR_IA32_XFD 0x000001c4 > +#define MSR_IA32_XFD_ERR 0x000001c5 > > #define MSR_IA32_APICBASE 0x0000001b > #define MSR_IA32_APICBASE_BSP (1<<8) > diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c > index ece6428ba85b..2e07bfcd54b3 100644 > --- a/arch/x86/kernel/fpu/core.c > +++ b/arch/x86/kernel/fpu/core.c > @@ -518,3 +518,40 @@ int fpu__exception_code(struct fpu *fpu, int trap_nr) > */ > return 0; > } > + > +bool xfirstuse_event_handler(struct fpu *fpu) > +{ > + bool handled = false; > + u64 event_mask; > + > + /* Check whether the first-use detection is running. */ > + if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled()) > + return handled; > + > + rdmsrl_safe(MSR_IA32_XFD_ERR, &event_mask);
NAK. MSR_IA32_XFD_ERR needs to be wired up in the exception handler, not in some helper called farther down the stack But this raises an interesting point -- what happens if allocation fails? I think that, from kernel code, we simply cannot support this exception mechanism. If kernel code wants to use AMX (and that would be very strange indeed), it should call x86_i_am_crazy_amx_begin() and handle errors, not rely on exceptions. From user code, I assume we send a signal if allocation fails.

