> From: Dale Rahn <dale.r...@gmail.com>
> Date: Tue, 8 Aug 2023 12:36:45 -0400
> 
> Switching the computation of cycles/delaycnt to a proper 64 value math
> instead of the '32 bit safe' complex math is likely a good idea.
> However I am not completely convinced that switching to 'yield' (current
> CPU_BUSY_CYCLE implementation) for every loop of a 'short wait' in the wait
> loop makes sense. In a hypervisor environment, this could cause a very
> short wait between register writes to become very long, In a non-hypervisor
> environment there is essentially no improvement because the yield doesn't
> really have any benefits on non-hypervisor.

Dale, I think you're confused here.  There is no arcitectural way to
trap a YIELD instruction.  The docs explicitly state that SMT and SMP
systems.  I suspect that this instruction was primarily introduced for
SMT systems that never materialized; I completely believe you that on
current hardware it does nothing.

Even without a YIELD instruction a hypervisor might interrupt us and
schedule a different vCPU onto the core.  And on real hardware
external interrupts may happen.  So you really can't count on delay(9)
being accurate.

Linux does use YIELD in it delay loop.

> To my current understanding, there is no useful 'wait short period' on arm
> cores.

There is WFET (and WFIT), but that is Armv8.7 material, so not
availble on actual hardware that we run on.  Linux uses that
instruction in its delay loop when available.

On machines with a working Generic Timer event stream, Linux uses WFE
if the delay is long enough.
 
> On Mon, Aug 7, 2023 at 10:32 PM Scott Cheloha <scottchel...@gmail.com>
> wrote:
> 
> > The agtimer(4/arm64) delay(9) implementation is quite complicated.
> > This patch simplifies it.
> >
> > Am I missing something here?  There's no reason to implement the
> > busy-loop like this, right?
> >
> > ok?
> >
> > Index: agtimer.c
> > ===================================================================
> > RCS file: /cvs/src/sys/arch/arm64/dev/agtimer.c,v
> > retrieving revision 1.23
> > diff -u -p -r1.23 agtimer.c
> > --- agtimer.c   25 Jul 2023 18:16:19 -0000      1.23
> > +++ agtimer.c   8 Aug 2023 02:24:57 -0000
> > @@ -323,32 +323,12 @@ agtimer_cpu_initclocks(void)
> >  void
> >  agtimer_delay(u_int usecs)
> >  {
> > -       uint64_t                clock, oclock, delta, delaycnt;
> > -       uint64_t                csec, usec;
> > -       volatile int            j;
> > +       uint64_t cycles, start;
> >
> > -       if (usecs > (0x80000000 / agtimer_frequency)) {
> > -               csec = usecs / 10000;
> > -               usec = usecs % 10000;
> > -
> > -               delaycnt = (agtimer_frequency / 100) * csec +
> > -                   (agtimer_frequency / 100) * usec / 10000;
> > -       } else {
> > -               delaycnt = agtimer_frequency * usecs / 1000000;
> > -       }
> > -       if (delaycnt <= 1)
> > -               for (j = 100; j > 0; j--)
> > -                       ;
> > -
> > -       oclock = agtimer_readcnt64();
> > -       while (1) {
> > -               for (j = 100; j > 0; j--)
> > -                       ;
> > -               clock = agtimer_readcnt64();
> > -               delta = clock - oclock;
> > -               if (delta > delaycnt)
> > -                       break;
> > -       }
> > +       start = agtimer_readcnt64();
> > +       cycles = (uint64_t)usecs * agtimer_frequency / 1000000;
> > +       while (agtimer_readcnt64() - start < cycles)
> > +               CPU_BUSY_CYCLE();
> >  }
> >
> >  void
> >
> >
> 

Reply via email to