sgx: Add EPC OOM path to forcefully reclaim EPC

Haitao Huang Mon, 16 Oct 2023 12:52:38 -0700

On Mon, 16 Oct 2023 05:57:36 -0500, Huang, Kai <[email protected]> wrote:

On Thu, 2023-10-12 at 08:27 -0500, Haitao Huang wrote:
On Tue, 10 Oct 2023 19:51:17 -0500, Huang, Kai <[email protected]>wrote:
[...]
> (btw, even you track VA/SECS pages in unreclaimable list, given they
> both have
> 'enclave' as the owner,  do you still need SGX_EPC_OWNER_ENCL and
> SGX_EPC_OWNER_PAGE ?)

Let me think about it, there might be also a way just track encl objects
not unreclaimable pages.
I still not get why we need kill the VM not just remove just enoughpages.
Is it due to the static allocation not able to reclaim?
We can choose to "just remove enough EPC pages". The VM may or may notbekilled when it wants the EPC pages back, depending on whether thecurrent EPC
cgroup can provide enough EPC pages or not.  And this depends on how we
implement the cgroup algorithm to reclaim EPC pages.
One problem could be: for a EPC cgroup only has SGX VMs, you may end upwith
moving EPC pages from one VM to another and then vice versa endlessly,

This could be a valid use case though if people intend to share EPCsbetween two VMs. Understand no one would be able to use VMs this waycurrently with the static allocation.

because
you never really actually mark any VM to be dead just like OOM does tothe
normal enclaves.

From this point, you still need a way to kill a VM, IIUC.
I think the key point of virtual EPC vs cgroup, as quoted from Sean,should be
"having sane, well-defined behavior".
Does "just remove enough EPC pages" meet this? If the problem mentionedabovecan be avoided, I suppose so? So if there's an easy way to achieve, Iguess it
can be an option too.

But for the initial support, IMO we are not looking for a perfect but yet
complicated solution. I would say, from review's point of view, it'spreferredto have a simple implementation to achieve a not-prefect, butconsistent, well-
defined behaviour.
So to me looks killing the VM when cgroup cannot reclaim any more EPCpages is a
simple option.
But I might have missed something, especially since middle of last weekI have
been having fever and headache :-)

So as mentioned above, you can try other alternatives, but please avoid
complicated ones.
Also, I guess it will be helpful if we can understand the typical SGXapp and/orSGX VM deployment under EPC cgroup use case. This may help us onjustifying why
the EPC cgroup algorithm to select victim is reasonable.

From this perspective, I think the current implementation is"well-defined": EPC cgroup limits for VMs are only enforced at VM launchtime, not runtime. In practice, SGX VM can be launched only with fixedEPC size and all those EPCs are fully committed to the VM once launched.Because of that, I imagine people are using VMs to primarily partition thephysical EPCs, i.e, the static size itself is the 'limit' for the workloadof a single VM and not expecting EPCs taken away at runtime.


So killing does not really add much value for the existing usages IIUC.

That said, I don't anticipate adding the enforcement of killing VMs atruntime would break such usages as admin/user can simply choose to set thelimit equal to the static size to launch the VM and forget about it.

Given that, I'll propose an add-on patch to this series as RFC and havesome feedback from community before we decide if that needs be included infirst version or we can skip it until we have EPC reclaiming for VMs.


Thanks
Haitao

Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC

Reply via email to