Re: [PATCH gnumach] mach_clock: Use a callwheel as per Costello & Varghese '95

Samuel Thibault Mon, 08 Dec 2025 15:36:02 -0800

Hello,

Thanks for this!


Damien Zammit, le sam. 06 déc. 2025 06:57:37 +0000, a ecrit:
> Timeouts are now very fast to look up - almost O(1) when there are few,
> otherwise a much shorter list is traversed rather than all of them.

I wouldn't say O(1). This is simply dividing the complexity by up to
TIMEOUT_WHEEL_SIZE, at the expense of TIMEOUT_WHEEL_SIZE times more
storage.

> This is achieved by storing the elements by index into a circular array
> as the time to expiry modulo the wheel size.  Each spoke of the wheel
> stores a short queue containing the elements that match the expiry time.

Please put this as comment above the wheel definition, so people don't
have to dig in the git log to find how all of this works.

> Therefore only timeouts that are set to likely expire at elapsed_ticks
> are checked during the clock interrupt, saving unnecessary processing time.

? During clock interrupt we were not checking more than needed, we
were stopping at the first timeout after the current time. What was
expensive, however, is set_timeout() which was going through all the
previous timers, that's what we divide by up to TIMEOUT_WHEEL_SIZE.

In your current implementation, you got rid of trying to sort queue
items by expiration date. This is fine, provided we don't have too many
timers so softclock() won't have to look through too many timers, but
please explicit so in comments.

> This speeds up boot time noticably and therefore should be faster to run
> generally.

And notably because timers don't always expire, so set/reset should
really be made fast.

> The thread timers are stored within the thread structures
> themselves, so there is no arbitrary limit on their number, however for
> legacy timeouts there are NTIMERS == 20 currently allocated statically
> for things like comtimer (serial driver) which are in a reusable array.

That should be fine, yes.

> diff --git a/kern/mach_clock.c b/kern/mach_clock.c
> index d7858a79..3c12ec39 100644
> --- a/kern/mach_clock.c
> +++ b/kern/mach_clock.c
> @@ -65,12 +65,22 @@
>  #endif
>  
>  #define MICROSECONDS_IN_ONE_SECOND 1000000

Please put here a comment to reference to the Costello & Varghese '95 paper.

> +#define TIMEOUT_WHEEL_BITS 8
> +#define TIMEOUT_WHEEL_SIZE (1 << TIMEOUT_WHEEL_BITS)
> +#define TIMEOUT_WHEEL_MASK (TIMEOUT_WHEEL_SIZE - 1)
> +#define MAX_SOFTCLOCK_STEPS (TIMEOUT_WHEEL_SIZE >> 2)

You also want a

#define TIMEOUT_WHEEL_QUEUE(ticks) (&timeoutwheel[ticks & TIMEOUT_WHEEL_MASK])

do avoid rewriting that computation in all the code.

> +unsigned long  softticks = 0;          /* like elapsed_ticks but for 
> softclock() */

That is not really saying what it represents: the last tick for which we
have checked for timers.

> @@ -249,23 +258,20 @@ void clock_interrupt(
>       if (my_cpu == master_cpu) {
>  
>           spl_t s;
> -         timer_elt_t telt;
>           boolean_t   needsoft = FALSE;
>  
> -
>           /*
>            *  Update the tick count since bootup, and handle
> -          *  timeouts.
> +          *  timeouts.

Spurious change that actually breaks alignment.

>            */
>  
> -         s = simple_lock_irq(&timer_lock);
> +         s = simple_lock_irq(&twheel_lock);
>  
>           elapsed_ticks++;
>  
> -         telt = (timer_elt_t)queue_first(&timer_head.chain);
> -         if (telt->ticks <= elapsed_ticks)
> -             needsoft = TRUE;
> -         simple_unlock_irq(s, &timer_lock);
> +         if (!queue_empty(&timeoutwheel[elapsed_ticks & TIMEOUT_WHEEL_MASK]))
> +             needsoft = TRUE;

You'd also want to check the actual time of the first element, to avoid
setting a softclock when the timer is actually for much later.

> +         simple_unlock_irq(s, &twheel_lock);
>  
>           /*
>            *  Increment the time-of-day clock.
> @@ -310,122 +316,156 @@ void clock_interrupt(
>               else {
>                   setsoftclock();
>               }
> +         } else if (softticks + 1 == elapsed_ticks) {
> +             ++softticks;

Put a comment to explain this.

>           }
>       }
>       last_hpc_read = hpclock_read_counter();
>  }
>  
>  void softclock(void)
>  {
>       /*
>        *      Handle timeouts.
>        */
> +     register timeout_t t;
> +     register queue_head_t *spoke;
> +     register int steps; /* number of steps taken since we last allowed 
> interrupts */

But what is a "step" ?

> +     register unsigned long curticks;

The register qualifier is really completely useless nowadays.

>       spl_t   s;

> +     steps = 0;
> +     s = simple_lock_irq(&twheel_lock);
> +     while (softticks != elapsed_ticks) {
> +             ++softticks;
> +             curticks = softticks;
> +             spoke = &timeoutwheel[curticks & TIMEOUT_WHEEL_MASK];

> +             t = NULL;
> +             if (!queue_empty(spoke))
> +                     t = (timeout_t)queue_next(spoke);
> +             while (t) {

Replace that with
t = (timeout_t)queue_next(spoke);
while (! (t >= &timeoutwheel[0] && t < &timeoutwheel[TIMEOUT_WHEEL_SIZE]) )

so that you can get rid of all the special-casing of
if (t->chain.next == spoke)
etc.

> +                     if (t->t_time != curticks) {
> +                             /* Cannot check tail via spoke because it's the 
> wrong list */

Explain when that happens.

> +                             if (t->chain.next == 
> (queue_entry_t)&timeoutwheel[t->t_time & TIMEOUT_WHEEL_MASK])
> +                                     t = NULL;
> +                             else
> +                                     t = (timeout_t)queue_next(&t->chain);
> +
> +                             if (++steps >= MAX_SOFTCLOCK_STEPS) {
> +                                     nextsoftcheck = t;
> +                                     simple_unlock_irq(s, &twheel_lock); /* 
> Give hardclock() a chance */

Comment why we need it. That'll explain what "step" is.

I don't see why MAX_SOFTCLOCK_STEPS is expressed in terms of
TIMEOUT_WHEEL_SIZE. I don't know the research paper behind all this, but
the frequency with which we should be releasing hardclock() from times
to times doesn't seem to me related to the wheel size at all.

> +                                     s = simple_lock_irq(&twheel_lock);
> +                                     t = nextsoftcheck;
> +                                     steps = 0;
> +                             }
> +                     } else {
> +                             void (*t_fcn)(void *);
> +                             void *t_arg;
> +
> +                             if (t->chain.next == spoke)
> +                                     nextsoftcheck = NULL;
> +                             else
> +                                     nextsoftcheck = 
> (timeout_t)queue_next(&t->chain);
> +                             queue_remove(spoke, t, timeout_t, chain);
> +                             t->chain.prev = t->chain.next = NULL;
> +                             t_fcn = t->fcn;
> +                             t_arg = t->param;
> +                             if (t->set & TIMEOUT_ALLOC)
> +                                     t->set = TIMEOUT_ALLOC;
> +                             else
> +                                     t->set &= ~TIMEOUT_PENDING;
> +                             simple_unlock_irq(s, &twheel_lock);
> +                             assert(t_fcn != 0);
> +                             t_fcn(t_arg); /* call function at elapsed */
> +                             s = simple_lock_irq(&twheel_lock);
> +                             steps = 0;
> +                             t = nextsoftcheck;
> +                     }
> +             }
>       }
> +     nextsoftcheck = NULL;
> +     simple_unlock_irq(s, &twheel_lock);
>  }
>  
>  /*
>   *   Set timeout.
>   *
>   *   Parameters:
> - *           telt     timer element.  Function and param are already set.
> - *           interval time-out interval, in hz.
> + *           t        preallocated timeout. Function and param are already 
> set.
> + *           interval time-out interval, in ticks.
>   */
>  void set_timeout(
> -     timer_elt_t     telt,   /* already loaded */
> +     timeout_t       t,      /* already loaded */
>       unsigned int    interval)
>  {
> -     spl_t                   s;
> -     timer_elt_t             next;
> +     unsigned int index;
> +     spl_t s;
>  
> -     s = simple_lock_irq(&timer_lock);
> +     if (t->set & (TIMEOUT_ACTIVE | TIMEOUT_PENDING))

Does that really happen? From the existing code, it seems that
set_timeout does assume that it's not called on a timer that is already
set.

> +             reset_timeout(t);
>  
> -     /* Start counting after next tick, to avoid partial ticks.  */
> -     interval += elapsed_ticks + 1;

Note the comment: we do not want to drop that. It's important otherwise
if we are e.g. in the middle of a tick period, we will wake the timer
too early.

> +     s = simple_lock_irq(&twheel_lock);
>  
> -     for (next = (timer_elt_t)queue_first(&timer_head.chain);
> -          ;
> -          next = (timer_elt_t)queue_next((queue_entry_t)next)) {
> +     if (!interval)
> +         interval = 1;

So you don't want this.

> +     assert (!(t->set & (TIMEOUT_ACTIVE | TIMEOUT_PENDING)));
> +
> +     t->set |= TIMEOUT_ACTIVE | TIMEOUT_PENDING;
> +     t->t_time = elapsed_ticks + interval;

But rather +1 there.

> +     index = t->t_time & TIMEOUT_WHEEL_MASK;

index is used only once, so rather use the TIMEOUT_WHEEL_QUEUE macro.

> -         if (next->ticks > interval)
> -             break;
> -     }
> -     telt->ticks = interval;
>       /*
> -      * Insert new timer element before 'next'
> -      * (after 'next'->prev)
> +      * Insert new timeout at the end of the corresponding wheel entry
>        */
> -     insque((queue_entry_t) telt, ((queue_entry_t)next)->prev);
> -     telt->set = TELT_SET;
> -     simple_unlock_irq(s, &timer_lock);
> +     queue_enter(&timeoutwheel[index], t, timeout_t, chain);
> +
> +     simple_unlock_irq(s, &twheel_lock);
>  }
>  
> -boolean_t reset_timeout(timer_elt_t telt)
> +boolean_t reset_timeout(timeout_t t)
>  {
> -     spl_t   s;
> +     queue_head_t *spoke;
> +     spl_t s;
> +
> +     s = simple_lock_irq(&twheel_lock);
> +     if (!(t->set & TIMEOUT_PENDING)) {
> +             t->set &= ~TIMEOUT_ACTIVE;
> +             simple_unlock_irq(s, &twheel_lock);
> +             return FALSE;
> +     }
> +
> +     t->set &= ~(TIMEOUT_PENDING | TIMEOUT_ACTIVE);
>  
> -     s = simple_lock_irq(&timer_lock);
> -     if (telt->set) {
> -         remqueue(&timer_head.chain, (queue_entry_t)telt);
> -         telt->set = TELT_UNSET;
> -         simple_unlock_irq(s, &timer_lock);
> -         return TRUE;
> +     spoke = &timeoutwheel[t->t_time & TIMEOUT_WHEEL_MASK];
> +
> +     if (nextsoftcheck == t) {
> +             if (t->chain.next == spoke)
> +                     nextsoftcheck = NULL;
> +             else
> +                     nextsoftcheck = (timeout_t)queue_next(&t->chain);

Here again the changed while loop in softclock() will make this
special-casing away.

>       }
> -     else {
> -         simple_unlock_irq(s, &timer_lock);
> -         return FALSE;
> +
> +     if (!queue_empty(spoke)) {

? Can it happen that spoke is empty, since we know the timeout was
pending? Better assert against it.

> +             queue_remove(spoke, t, timeout_t, chain);
> +             t->chain.prev = t->chain.next = NULL;
> +             simple_unlock_irq(s, &twheel_lock);
> +             return TRUE;
>       }
> +     simple_unlock_irq(s, &twheel_lock);
> +     return FALSE;
>  }
>  

>  /*
>   * Returns a boolean indicating whether the timeout element was found
>   * and removed.
>   */
> -boolean_t untimeout(void (*fcn)( void * param ), const void *param)
> +boolean_t untimeout(timeout_t t)
>  {

> +     return reset_timeout(t);
>  }

Rather just drop the function and make chario.c call reset_timeout
directly?

> diff --git a/kern/mach_clock.h b/kern/mach_clock.h
> index e83b638c..248aab7c 100644
> --- a/kern/mach_clock.h
> +++ b/kern/mach_clock.h
> @@ -46,20 +46,20 @@ extern time_value64_t     uptime;         /* time since 
> bootup */
>  typedef void timer_func_t(void *);
>  
>  /* Time-out element.  */
> -struct timer_elt {
> -     queue_chain_t   chain;          /* chain in order of expiration */
> +struct timeout {
> +     queue_chain_t   chain;          /* chain of timeouts */

Comment that we do *not* spend the time to keep it in order of expiration.

>       timer_func_t    *fcn;           /* function to call */
>       void *          param;          /* with this parameter */
> -     unsigned long   ticks;          /* expiration time, in ticks */
> -     int             set;            /* unset | set | allocated */
> +     unsigned long   t_time;         /* expiration time, in ticks elapsed 
> from boot */
> +     unsigned char   set;            /* active | pending | allocated from 
> pool */
>  };

Re: [PATCH gnumach] mach_clock: Use a callwheel as per Costello & Varghese '95

Reply via email to