On Tue, Dec 1, 2020 at 6:28 AM Mathieu Desnoyers <[email protected]> wrote: > > ----- On Dec 1, 2020, at 5:16 AM, Peter Zijlstra [email protected] wrote: > > > On Mon, Nov 30, 2020 at 09:50:35AM -0800, Andy Lutomirski wrote: > >> membarrier() carefully propagates SYNC_CORE and RSEQ actions to all > >> other CPUs, but there are two issues. > >> > >> - membarrier() does not sync_core() or rseq_preempt() the calling > >> CPU. Aside from the logic being mind-bending, this also means > >> that it may not be safe to modify user code through an alias, > >> call membarrier(), and then jump to a different executable alias > >> of the same code. > > > > I always understood this to be on purpose. The calling CPU can fix up > > itself just fine. The pain point is fixing up the other CPUs, and that's > > where membarrier() helps. > > Indeed, as documented in the man page: > > MEMBARRIER_CMD_PRIVATE_EXPEDITED_SYNC_CORE (since Linux 4.16) > In addition to providing the memory ordering guarantees de‐ > scribed in MEMBARRIER_CMD_PRIVATE_EXPEDITED, upon return from > system call the calling thread has a guarantee that all its run‐ > ning thread siblings have executed a core serializing instruc‐ > tion. This guarantee is provided only for threads in the same > process as the calling thread. > > membarrier sync core guarantees a core serializing instruction on the > siblings, > not on the caller thread. This has been done on purpose given that the caller > thread can always issue its core serializing instruction from user-space on > its own. > > > > > That said, I don't mind including self, these aren't fast calls by any > > means. > > I don't mind including self either, but this would require documentation > updates, including man pages, to state that starting from kernel Y this > is the guaranteed behavior. It's then tricky for user-space to query what > the behavior is unless we introduce a new membarrier command for it. So this > could introduce issues if software written for the newer kernels runs on older > kernels.
For rseq at least, if we do this now we don't have this issue -- I don't think any released kernel has the rseq mode. > > > > >> - membarrier() does not explicitly sync_core() remote CPUs either; > >> instead, it relies on the assumption that an IPI will result in a > >> core sync. On x86, I think this may be true in practice, but > >> it's not architecturally reliable. In particular, the SDM and > >> APM do not appear to guarantee that interrupt delivery is > >> serializing. > > > > Right, I don't think we rely on that, we do rely on interrupt delivery > > providing order though -- as per the previous email. > > > >> On a preemptible kernel, IPI return can schedule, > >> thereby switching to another task in the same mm that was > >> sleeping in a syscall. The new task could then SYSRET back to > >> usermode without ever executing IRET. > > > > This; I think we all overlooked this scenario. > > Indeed, this is an issue which needs to be fixed. > > > > >> This patch simplifies the code to treat the calling CPU just like > >> all other CPUs, and explicitly sync_core() on all target CPUs. This > >> eliminates the need for the smp_mb() at the end of the function > >> except in the special case of a targeted remote membarrier(). This > >> patch updates that code and the comments accordingly. > > I am not confident that removing the smp_mb at the end of membarrier is > an appropriate change, nor that it simplifies the model. Ah, but I didn't remove it. I carefully made sure that every possible path through the function does an smp_mb() or stronger after all the cpu_rq reads. ipi_func(), on_each_cpu(), and the explicit smp_mb() cover the three cases. That being said, if you prefer, I can make the change to skip the calling CPU, in which case I'll leave the smp_mb() at the end alone.

