"tracing_map->next_elt" in get_free_elt() is at risk of overflowing.
Once it overflows, new elements can still be inserted into the tracing_map even though the maximum number of elements (`max_elts`) has been reached. Continuing to insert elements after the overflow could result in the tracing_map containing "tracing_map->max_size" elements, leaving no empty entries. If any attempt is made to insert an element into a full tracing_map using `__tracing_map_insert()`, it will cause an infinite loop with preemption disabled, leading to a CPU hang problem. Fix this by preventing any further increments to "tracing_map->next_elt" once it reaches "tracing_map->max_elt". Co-developed-by: Cheng-Jui Wang <cheng-jui.w...@mediatek.com> Signed-off-by: Cheng-Jui Wang <cheng-jui.w...@mediatek.com> Signed-off-by: Tze-nan Wu <tze-nan...@mediatek.com> --- We have encountered this issue internally after enabling the throttle_rss_stat feature provided by Perfetto in background for more than two days, during which `rss_stat` tracepoint was invoked over 2^32 times. After tracing_map->next_elt overflow, new elements can continue to be inserted to the tracing_map belong to `rss_stat`. Then the CPU could hang inside the while dead loop in function `__tracing_map_insert()` by calling it after the tracing_map left no empty entry. Call trace during hang: __tracing_map_insert() tracing_map_insert() event_hist_trigger() event_triggers_call() __event_trigger_test_discard() trace_event_buffer_commit() trace_event_raw_event_rss_stat() __traceiter_rss_stat() trace_rss_stat() mm_trace_rss_stat() inc_mm_counter() do_swap_page() throttle_rss_stat is literally a synthetic event triggered by `rss_stat` with condition: 1. $echo "rss_stat_throttled unsigned int mm_id unsigned int curr int member long size" >> /sys/kernel/tracing/synthetic_events 2. $echo 'hist:keys=mm_id,member:bucket=size/0x80000:onchange($bucket).rss_stat_ throttled(mm_id,curr,member,size)' > /sys/kernel/tracing/events/kmem/rss_stat/trigger The hang issue could also be reproduced easily by calling a customize trace event `myevent(u64 mycount)` for more than 2^32+(map_size-max_elts) times during its histogram enabled with the key set to variable "mycount". While we call `myevent` with different argument "mycount" everytime. BTW, I added Cheng-jui to Co-developed because we have a lot of discussions during debugging this. --- kernel/trace/tracing_map.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c index a4dcf0f24352..3a56e7c8aa4f 100644 --- a/kernel/trace/tracing_map.c +++ b/kernel/trace/tracing_map.c @@ -454,7 +454,7 @@ static struct tracing_map_elt *get_free_elt(struct tracing_map *map) struct tracing_map_elt *elt = NULL; int idx; - idx = atomic_inc_return(&map->next_elt); + idx = atomic_fetch_add_unless(&map->next_elt, 1, map->max_elts); if (idx < map->max_elts) { elt = *(TRACING_MAP_ELT(map->elts, idx)); if (map->ops && map->ops->elt_init) @@ -699,7 +699,7 @@ void tracing_map_clear(struct tracing_map *map) { unsigned int i; - atomic_set(&map->next_elt, -1); + atomic_set(&map->next_elt, 0); atomic64_set(&map->hits, 0); atomic64_set(&map->drops, 0); @@ -783,7 +783,7 @@ struct tracing_map *tracing_map_create(unsigned int map_bits, map->map_bits = map_bits; map->max_elts = (1 << map_bits); - atomic_set(&map->next_elt, -1); + atomic_set(&map->next_elt, 0); map->map_size = (1 << (map_bits + 1)); map->ops = ops; -- 2.18.0