On 02/24/2016 02:56 AM, Jan Kara wrote:
On Tue 23-02-16 14:04:30, Waiman Long wrote:
Linked list is used everywhere in the Linux kernel. However, if many
threads are trying to add or delete entries into the same linked list,
it can create a performance bottleneck.

This patch introduces a new per-cpu list subystem with associated
per-cpu locks for protecting each of the lists individually. This
allows list entries insertion and deletion operations to happen in
parallel instead of being serialized with a global list and lock.

List entry insertion is strictly per cpu. List deletion, however, can
happen in a cpu other than the one that did the insertion. So we still
need lock to protect the list. Because of that, there may still be
a small amount of contention when deletion is being done.

A new header file include/linux/percpu-list.h will be added with the
associated pcpu_list_head and pcpu_list_node structures. The following
functions are provided to manage the per-cpu list:

  1. int init_pcpu_list_head(struct pcpu_list_head **ppcpu_head)
  2. void pcpu_list_add(struct pcpu_list_node *node,
                       struct pcpu_list_head *head)
  3. void pcpu_list_del(struct pcpu_list *node)

Iteration of all the list entries within a group of per-cpu
lists is done by calling either the pcpu_list_iterate() or
pcpu_list_iterate_safe() functions in a while loop. They correspond
to the list_for_each_entry() and list_for_each_entry_safe() macros
respectively. The iteration states are keep in a pcpu_list_state
structure that is passed to the iteration functions.

Signed-off-by: Waiman Long<[email protected]>
Two comments below.

+/*
+ * Helper function to find the first entry of the next per-cpu list
+ * It works somewhat like for_each_possible_cpu(cpu).
+ *
+ * Return: true if the entry is found, false if all the lists exhausted
+ */
+static __always_inline bool
+__pcpu_list_next_cpu(struct pcpu_list_head *head, struct pcpu_list_state 
*state)
+{
+       if (state->lock)
+               spin_unlock(state->lock);
+next_cpu:
+       /*
+        * for_each_possible_cpu(cpu)
+        */
+       state->cpu = cpumask_next(state->cpu, cpu_possible_mask);
+       if (state->cpu>= nr_cpu_ids)
+               return false;   /* All the per-cpu lists iterated */
+
+       state->head =&per_cpu_ptr(head, state->cpu)->list;
+       state->lock =&per_cpu_ptr(head, state->cpu)->lock;
+       state->curr = list_entry(state->head->next,
+                                struct pcpu_list_node, list);
+       if (&state->curr->list == state->head)
+               goto next_cpu;
This might be more comprehensible as:

        if (list_empty(state->head))
                goto next_cpu;

and you can do it just after updating state->head (no need to init
state->lock&  state->curr if the list is empty).

Thank for the suggestion. Will change the code accordingly.
Another note: Initialization of state->curr is IMO racy - you need to hold
state->lock to grab state->curr reliably, don't you? Otherwise someone can
remove the entry while you are working with it. So you need to move that
down just before returning.

Right. I will move the initialization of state->curr after the spin_lock().

+
+       spin_lock(state->lock);
+       return true;
+}
+#endif /* NR_CPUS == 1 */
...

+/*
+ * Delete a node from a percpu list
+ *
+ * We need to check the lock pointer again after taking the lock to guard
+ * against concurrent delete of the same node. If the lock pointer changes
+ * (becomes NULL or to a different one), we assume that the deletion was done
+ * elsewhere.
+ */
+void pcpu_list_del(struct pcpu_list_node *node)
+{
+       spinlock_t *lock = READ_ONCE(node->lockptr);
+
+       if (unlikely(!lock))
+               return;
+
+       spin_lock(lock);
+       if (likely(lock == node->lockptr)) {
+               list_del_init(&node->list);
+               node->lockptr = NULL;
+       }
But someone changing lockptr under your hands would mean that there are
two processes racing to remove entries and that would generally point to a
problem (and likely use-after-free) in the caller, won't it? Or do you have
some particular usecase in mind?

                                                                Honza


This is just defensive programming to guard against unforeseen case. I don't have any particular use case in mind that will make that happen. Maybe I should put a WARN_ON if this really happens.

Cheers,
Longman

Reply via email to