On 2025/11/22 10:34, Alexei Starovoitov wrote:
> On Mon, Nov 17, 2025 at 8:22 AM Leon Hwang <[email protected]> wrote:
>>
[...]
>> +
>> + /* lookup then check value on CPUs */
>> + for (j = 0; j < nr_cpus; j++) {
>> + flags = (u64)j << 32 | BPF_F_CPU;
>> + err = bpf_map__lookup_elem(map, keys + i *
>> key_sz, key_sz, values,
>> + value_sz, flags);
>> + if (!ASSERT_OK(err, "bpf_map__lookup_elem
>> specified cpu"))
>> + goto out;
>> + if (!ASSERT_EQ(values[0], j != cpu ? 0 :
>> value,
>> + "bpf_map__lookup_elem value
>> on specified cpu"))
>> + goto out;
>
> I was about to apply it, but noticed that the test is unstable.
> It fails 1 out of 10 for me in the above line.
> test_percpu_map_op_cpu_flag:PASS:bpf_map_lookup_batch value on
> specified cpu 0 nsec
> test_percpu_map_op_cpu_flag:FAIL:bpf_map_lookup_batch value on
> specified cpu unexpected bpf_map_lookup_batch value on specified cpu:
> actual 0 != expected 3735929054
> #261/15 percpu_alloc/cpu_flag_lru_percpu_hash:FAIL
> #261 percpu_alloc:FAIL
>
> Please investigate what is going on.
>
I was able to reproduce the failure on a 16-core VM.
It appears to be caused by LRU eviction. When I increased max_entries of
the lru_percpu_hash map to libbpf_num_possible_cpus(), the issue no
longer reproduced.
I'll need to spend more time investigating the exact eviction behavior
and why it shows up intermittently in this test.
Thanks,
Leon