在 2026/5/14 09:50, Alexei Starovoitov 写道:
> On Wed May 13, 2026 at 3:53 PM PDT, Eduard Zingerman wrote:
>> On Tue, 2026-05-12 at 06:41 +0000, [email protected] wrote:
>>
>> [...]
>>
>>> When a BPF program holds an owning or refcount-acquired reference to
>>> one of these nodes (node X), which is structurally supported because
>>> __bpf_obj_drop_impl() uses refcount_dec_and_test() and only frees at
>>> refcount 0, a concurrent push to a DIFFERENT bpf_list_head becomes a
>>> corruption:
>>>
>>> CPU 0 (bpf_list_head_free, lock released) CPU 1 (BPF prog, refcount X)
>>> ----------------------------------------- ----------------------------
>>> (owner of X == NULL, X linked in drain)
>>> bpf_list_push_back(other, X)
>>> __bpf_list_add: spin_lock()
>>> cmpxchg(X->owner, NULL,
>>> POISON) -> OK
>>> list_add_tail(&X->list_head,
>>> other_head)
>>> -> overwrites X->next,
>>> X->prev, corrupts
>>> other_head's chain
>>> because X is still
>>> stitched into drain
>>> pos = drain.next; (may be X or neighbor using X's stale next)
>>> list_del_init(pos); reads X->next/prev now pointing into other_head,
>>> corrupts other_head's list and/or drain
>>
>>
>> Kaitao, this scenario seem plausible, could you please comment on it?
>
> I think bot is correct.
> This patch looks buggy.
> It seems to me an optimization that breaks the concurrent logic.
> May be just drop this patch and reorder the other one, so that bot
> sees nonown suffix logic first.
This patch is still necessary because it addresses the problem discussed
in this thread:
https://lore.kernel.org/all/[email protected]/
The patch does have a bug, however. To fix the issues we are seeing now,
I propose the additional changes below and would appreciate feedback.
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2263,8 +2263,10 @@ void bpf_list_head_free(const struct btf_field *field,
void *list_head,
if (!head->next || list_empty(head))
goto unlock;
list_for_each_safe(pos, n, head) {
- WRITE_ONCE(container_of(pos,
- struct bpf_list_node_kern, list_head)->owner, NULL);
+ struct bpf_list_node_kern *node;
+
+ node = container_of(pos, struct bpf_list_node_kern, list_head);
+ WRITE_ONCE(node->owner, BPF_PTR_POISON);
list_move_tail(pos, &drain);
}
unlock:
@@ -2272,8 +2274,12 @@ void bpf_list_head_free(const struct btf_field *field,
void *list_head,
__bpf_spin_unlock_irqrestore(spin_lock);
while (!list_empty(&drain)) {
+ struct bpf_list_node_kern *node;
+
pos = drain.next;
+ node = container_of(pos, struct bpf_list_node_kern, list_head);
list_del_init(pos);
+ WRITE_ONCE(node->owner, NULL);
/* The contained type can also have resources, including a
* bpf_list_head which needs to be freed.
*/
@@ -2481,6 +2487,14 @@ static int __bpf_list_add(struct bpf_list_node_kern
*node,
if (unlikely(!h->next))
INIT_LIST_HEAD(h);
+ /* bpf_list_head_free() marks nodes being detached with BPF_PTR_POISON
+ * before list_del_init(). cmpxchg(NULL, POISON) below would fail with
+ * that old value and fall into the generic error path, which wrongly
+ * calls __bpf_obj_drop_impl(). Reject POISON up front instead.
+ */
+ if (READ_ONCE(node->owner) == BPF_PTR_POISON)
+ return -EINVAL;
+
/* node->owner != NULL implies !list_empty(n), no need to separately
* check the latter
*/
--
Thanks
Kaitao Cheng