Hi Linus,

Thanks for having a look.

On Fri, Feb 22, 2019 at 01:49:32PM -0800, Linus Torvalds wrote:
> On Fri, Feb 22, 2019 at 10:50 AM Will Deacon <[email protected]> wrote:
> >
> > +#ifndef mmiowb_set_pending
> > +static inline void mmiowb_set_pending(void)
> > +{
> > +       __this_cpu_write(__mmiowb_state.mmiowb_pending, 1);
> > +}
> > +#endif
> > +
> > +#ifndef mmiowb_spin_lock
> > +static inline void mmiowb_spin_lock(void)
> > +{
> > +       if (__this_cpu_inc_return(__mmiowb_state.nesting_count) == 1)
> > +               __this_cpu_write(__mmiowb_state.mmiowb_pending, 0);
> > +}
> > +#endif
> 
> The case we want to go fast is the spin-lock and unlock case, not the
> "set pending" case.
> 
> And the way you implemented this, it's exactly the wrong way around.
> 
> So I'd suggest instead doing
> 
>   static inline void mmiowb_set_pending(void)
>   {
>       __this_cpu_write(__mmiowb_state.mmiowb_pending,
> __mmiowb_state.nesting_count);
>   }
> 
> and
> 
>   static inline void mmiowb_spin_lock(void)
>   {
>       __this_cpu_inc(__mmiowb_state.nesting_count);
>   }
> 
> which makes that spin-lock code much simpler and avoids the conditional there.

Makes sense; I'll hook that up for the next version.

> Then the unlock case could be something like
> 
>   static inline void mmiowb_spin_unlock(void)
>   {
>       if (unlikely(__this_cpu_read(__mmiowb_state.mmiowb_pending))) {
>           __this_cpu_write(__mmiowb_state.mmiowb_pending, 0);
>           mmiowb();
>       }
>       __this_cpu_dec(__mmiowb_state.nesting_count);
>   }
> 
> or something (xchg is generally much more expensive than read, and the
> common case for spinlocks is that nobody did IO inside of it).

So I *am* using __this_cpu_xchg() here, which means the architecture can
get away with plain old loads and stores (which is what RISC-V does, for
example), but I see that's not the case on e.g. x86 so I'll rework using
read() and write() because it doesn't hurt.

Will

Reply via email to