On Wed, Dec 5, 2018 at 3:20 PM Sean Christopherson <sean.j.christopher...@intel.com> wrote: > > Intel Software Guard Extensions (SGX) SGX introduces a new CPL3-only > enclave mode that runs as a sort of black box shared object that is > hosted by an untrusted normal CPL3 process. > > Enclave transitions have semantics that are a lovely blend of SYCSALL, > SYSRET and VM-Exit. In a non-faulting scenario, entering and exiting > an enclave can only be done through SGX-specific instructions, EENTER > and EEXIT respectively. EENTER+EEXIT is analogous to SYSCALL+SYSRET, > e.g. EENTER/SYSCALL load RCX with the next RIP and EEXIT/SYSRET load > RIP from R{B,C}X. > > But in a faulting/interrupting scenario, enclave transitions act more > like VM-Exit and VMRESUME. Maintaining the black box nature of the > enclave means that hardware must automatically switch CPU context when > an Asynchronous Exiting Event (AEE) occurs, an AEE being any interrupt > or exception (exceptions are AEEs because asynchronous in this context > is relative to the enclave and not CPU execution, e.g. the enclave > doesn't get an opportunity to save/fuzz CPU state). > > Like VM-Exits, all AEEs jump to a common location, referred to as the > Asynchronous Exiting Point (AEP). The AEP is specified at enclave entry > via register passed to EENTER/ERESUME, similar to how the hypervisor > specifies the VM-Exit point (via VMCS.HOST_RIP at VMLAUNCH/VMRESUME). > Resuming the enclave/VM after the exiting event is handled is done via > ERESUME/VMRESUME respectively. In SGX, AEEs that are handled by the > kernel, e.g. INTR, NMI and most page faults, IRET will journey back to > the AEP which then ERESUMEs th enclave. > > Enclaves also behave a bit like VMs in the sense that they can generate > exceptions as part of their normal operation that for all intents and > purposes need to handled in the enclave/VM. However, unlike VMX, SGX > doesn't allow the host to modify its guest's, a.k.a. enclave's, state, > as doing so would circumvent the enclave's security. So to handle an > exception, the enclave must first be re-entered through the normal > EENTER flow (SYSCALL/SYSRET behavior), and then resumed via ERESUME > (VMRESUME behavior) after the source of the exception is resolved. > > All of the above is just the tip of the iceberg when it comes to running > an enclave. But, SGX was designed in such a way that the host process > can utilize a library to build, launch and run an enclave. This is > roughly analogous to how e.g. libc implementations are used by most > applications so that the application can focus on its business logic. > > The big gotcha is that because enclaves can generate *and* handle > exceptions, any SGX library must be prepared to handle nearly any > exception at any time (well, any time a thread is executing in an > enclave). In Linux, this means the SGX library must register a > signal handler in order to intercept relevant exceptions and forward > them to the enclave (or in some cases, take action on behalf of the > enclave). Unfortunately, Linux's signal mechanism doesn't mesh well > with libraries, e.g. signal handlers are process wide, are difficult > to chain, etc... This becomes particularly nasty when using multiple > levels of libraries that register signal handlers, e.g. running an > enclave via cgo inside of the Go runtime. > > In comes vDSO to save the day. Now that vDSO can fixup exceptions, > add a function to wrap enclave transitions and intercept any exceptions > that occur in the enclave or on EENTER/ERESUME. The actually code is > blissfully short (especially compared to this changelog). > > In addition to the obvious trapnr, error_code and address, propagate > the leaf number, i.e. RAX, back to userspace so that the caller can know > whether the fault occurred in the enclave or if it occurred on EENTER. > A fault on EENTER generally means the enclave has died and needs to be > restarted. > > Suggested-by: Andy Lutomirski <l...@amacapital.net> > Cc: Andy Lutomirski <l...@amacapital.net> > Cc: Jarkko Sakkinen <jarkko.sakki...@linux.intel.com> > Cc: Dave Hansen <dave.han...@linux.intel.com> > Cc: Josh Triplett <j...@joshtriplett.org> > Signed-off-by: Sean Christopherson <sean.j.christopher...@intel.com> > --- > arch/x86/entry/vdso/Makefile | 1 + > arch/x86/entry/vdso/vdso.lds.S | 1 + > arch/x86/entry/vdso/vsgx_eenter.c | 108 ++++++++++++++++++++++++++++++ > 3 files changed, 110 insertions(+) > create mode 100644 arch/x86/entry/vdso/vsgx_eenter.c > > diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile > index eb543ee1bcec..ba46673076bd 100644 > --- a/arch/x86/entry/vdso/Makefile > +++ b/arch/x86/entry/vdso/Makefile > @@ -18,6 +18,7 @@ VDSO32-$(CONFIG_IA32_EMULATION) := y > > # files to link into the vdso > vobjs-y := vdso-note.o vclock_gettime.o vgetcpu.o > +vobjs-$(VDSO64-y) += vsgx_eenter.o > > # files to link into kernel > obj-y += vma.o extable.o > diff --git a/arch/x86/entry/vdso/vdso.lds.S b/arch/x86/entry/vdso/vdso.lds.S > index d3a2dce4cfa9..e422c4454f34 100644 > --- a/arch/x86/entry/vdso/vdso.lds.S > +++ b/arch/x86/entry/vdso/vdso.lds.S > @@ -25,6 +25,7 @@ VERSION { > __vdso_getcpu; > time; > __vdso_time; > + __vdso_sgx_eenter; > local: *; > }; > } > diff --git a/arch/x86/entry/vdso/vsgx_eenter.c > b/arch/x86/entry/vdso/vsgx_eenter.c > new file mode 100644 > index 000000000000..3df4a95a34cc > --- /dev/null > +++ b/arch/x86/entry/vdso/vsgx_eenter.c > @@ -0,0 +1,108 @@ > +// SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) > +// Copyright(c) 2018 Intel Corporation. > + > +#include <uapi/linux/errno.h> > +#include <uapi/linux/types.h> > + > +#include "extable.h" > + > +/* > + * This struct will be defined elsewhere in the actual implementation, > + * e.g. arch/x86/include/uapi/asm/sgx.h. > + */ > +struct sgx_eenter_fault_info { > + __u32 leaf; > + __u16 trapnr; > + __u16 error_code; > + __u64 address; > +}; > + > +/* > + * ENCLU (ENCLave User) is an umbrella instruction for a variety of CPL3 > + * SGX functions, The ENCLU function that is executed is specified in EAX, > + * with each function potentially having more leaf-specific operands beyond > + * EAX. In the vDSO we're only concerned with the leafs that are used to > + * transition to/from the enclave. > + */ > +enum sgx_enclu_leaves { > + SGX_EENTER = 2, > + SGX_ERESUME = 3, > + SGX_EEXIT = 4, > +}; > + > +notrace long __vdso_sgx_eenter(void *tcs, void *priv, > + struct sgx_eenter_fault_info *fault_info) > +{ > + u32 trapnr, error_code; > + long leaf; > + u64 addr; > + > + /* > + * %eax = EENTER > + * %rbx = tcs > + * %rcx = do_eresume > + * %rdi = priv > + * do_eenter: > + * enclu > + * jmp out > + * > + * do_eresume: > + * enclu > + * ud2
Is the only reason for do_eresume to be different from do_eenter so that you can do the ud2? > + * > + * out: > + * <return to C code> > + * > + * fault_fixup: > + * <extable loads RDI, DSI and RDX with fault info> > + * jmp out > + */ This has the IMO excellent property that it's extremely awkward to use it for a model where the enclave is reentrant. I think it's excellent because reentrancy on the same enclave thread is just asking for severe bugs. Of course, I fully expect the SDK to emulate reentrancy, but then it's 100% their problem :) On the fiip side, it means that you can't really recover from a reported fault, even if you want to, because there's no way to ask for ERESUME. So maybe the API should allow that after all. I think it might be polite to at least give some out regs, maybe RSI and RDI?