On Wed May 15, 2024 at 5:15 PM EEST, Dave Hansen wrote:
> On 5/15/24 06:54, Jarkko Sakkinen wrote:
> > I'd cut out 90% of the description out and just make the argument of
> > the wrong error code, and done. The sequence is great for showing
> > how this could happen. The prose makes my head hurt tbh.
>
> The changelog is too long, but not fatally so.  I'd much rather have a
> super verbose description than something super sparse.
>
> Would something like this make more sense to folks?
>
>       Imagine an mmap()'d file. Two threads touch the same address at
>       the same time and fault. Both allocate a physical page and race
>       to install a PTE for that page. Only one will win the race. The
>       loser frees its page, but still continues handling the fault as
>       a success and returns VM_FAULT_NOPAGE from the fault handler.
>
>       The same race can happen with SGX. But there's a bug: the loser
>       in the SGX steers into a failure path. The loser EREMOVE's the
>       winner's EPC page, then returns SIGBUS, likely killing the app.
>
>       Fix the SGX loser's behavior. Change the return code to
>       VM_FAULT_NOPAGE to avoid SIGBUS and call sgx_free_epc_page()
>       which avoids EREMOVE'ing the winner's page and only frees the
>       page that the loser allocated.

Yes!

I did read the whole thing. My comment was only related to the
chain of maintainers who also have to deal with this patch
eventually.

BR, Jarkko

Reply via email to