* Peter Zijlstra <[email protected]> wrote:
> Implement a latched RB-tree in order to get unconditional RCU/lockless
> lookups.
>
> Cc: Oleg Nesterov <[email protected]>
> Cc: Michel Lespinasse <[email protected]>
> Cc: Andrea Arcangeli <[email protected]>
> Cc: David Woodhouse <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Cc: Mathieu Desnoyers <[email protected]>
> Cc: "Paul E. McKenney" <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> include/linux/rbtree_latch.h | 212
> +++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 212 insertions(+)
>
> --- /dev/null
> +++ b/include/linux/rbtree_latch.h
> @@ -0,0 +1,212 @@
> +/*
> + * Latched RB-trees
> + *
> + * Copyright (C) 2015 Intel Corp., Peter Zijlstra <[email protected]>
> + *
> + * Since RB-trees have non atomic modifications they're not immediately
> suited
> + * for RCU/lockless queries. Even though we made RB tree lookups non-fatal
> for
> + * lockless lookups; we cannot guarantee they return a correct result.
> + *
> + * The simplest solution is a seqlock + rb-tree, this will allow lockless
> + * lookups; but has the constraint (inherent to the seqlock) that read sides
> + * cannot nest in write sides.
> + *
> + * If we need to allow unconditional lookups (say as required for NMI context
> + * usage) we need a more complex setup; this data structure provides this by
> + * employing the latch technique -- see @raw_write_seqcount_latch -- to
> + * implement a latched RB-tree which does allow for unconditional lookups by
> + * virtue of always having (at least) one stable copy of the tree.
> + *
> + * However, while we have the guarantee that there is at all times one stable
> + * copy, this does not guarantee an iteration will not observe modifications.
> + * What might have been a stable copy at the start of the iteration, need not
> + * remain so for the duration of the iteration.
> + *
> + * Therefore, this does require a lockless RB-tree iteration to be non-fatal;
> + * see the comment in lib/rbtree.c. Note however that we only require the
> first
> + * condition -- not seeing partial stores -- because the latch thing isolates
> + * us from loops. If we were to interrupt a modification the lookup would be
> + * pointed at the stable tree and complete while the modification was halted.
Minor nit: so this text has 3 variants to spell RB-trees:
RB-tree
RB tree
rb-tree
I suggest we pick one! :-)
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/