On Thu, Aug 06, 2020 at 12:57:26PM -0700, Dave Hansen wrote:
> On 8/6/20 12:25 PM, Ricardo Neri wrote:
> >  static inline void sync_core(void)
> >  {
> >     /*
> > -    * There are quite a few ways to do this.  IRET-to-self is nice
> > +    * Hardware can do this for us if SERIALIZE is available. Otherwise,
> > +    * there are quite a few ways to do this.  IRET-to-self is nice
> 
> This seems to imply that things other than SERIALIZE aren't the hardware
> doing this.  All of these methods are equally architecturally
> serializing *and* provided by the hardware.

Indeed, I can see how the wording may imply that.

> 
> I also don't quite get the point of separating the comments from the
> code.  Shouldn't this be:
> 
>       /*
>        * The SERIALIZE instruction is the most straightforward way to
>        * do this but it not universally available.
>        */

I regard the comment as describing all the considered options to for
architectural serialization. What about if I add SERIALIZE as another
option? I propose to discuss it towards the end of the comment:

        /*
         * There are quite a few ways to do this.  IRET-to-self is nice
         * because it works on every CPU, at any CPL (so it's compatible
         * with paravirtualization), and it never exits to a hypervisor.
         * The only down sides are that it's a bit slow (it seems to be
         * a bit more than 2x slower than the fastest options) and that
         * it unmasks NMIs.  The "push %cs" is needed because, in
         * paravirtual environments, __KERNEL_CS may not be a valid CS
         * value when we do IRET directly.
         *
         * In case NMI unmasking or performance ever becomes a problem,
         * the next best option appears to be MOV-to-CR2 and an
         * unconditional jump.  That sequence also works on all CPUs,
         * but it will fault at CPL3 (i.e. Xen PV).
         *
         * CPUID is the conventional way, but it's nasty: it doesn't
         * exist on some 486-like CPUs, and it usually exits to a
         * hypervisor.
         *
         * The SERIALIZE instruction is the most straightforward way to
         * do this as it does not clobber registers or exit to a
         * hypervisor. However, it is not universally available.
         *
         * Like all of Linux's memory ordering operations, this is a
         * compiler barrier as well.
         */

What do you think?

>       if (static_cpu_has(X86_FEATURE_SERIALIZE)) {
>               asm volatile(__ASM_SERIALIZE ::: "memory");
>               return;
>       }
> 
>       /*
>        * For all other processors, IRET-to-self is nice ...
>        */
>       iret_to_self();

Reply via email to