On 2/13/18 5:11 PM, Daniel Borkmann wrote:
On 02/12/2018 10:58 PM, Yonghong Song wrote:
There is a memory leak happening in lpm_trie map_free callback
function trie_free. The trie structure itself does not get freed.
Also, trie_free function did not do synchronize_rcu before freeing
various data structures. This is incorrect as some rcu_read_lock
region(s) for lookup, update, delete or get_next_key may not complete yet.
The fix is to add synchronize_rcu in the beginning of trie_free.
The useless spin_lock is removed from this function as well.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reported-by: Mathieu Malaterre <ma...@debian.org>
Reported-by: Alexei Starovoitov <a...@kernel.org>
Tested-by: Mathieu Malaterre <ma...@debian.org>
Signed-off-by: Yonghong Song <y...@fb.com>
kernel/bpf/lpm_trie.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 7b469d1..9b41ea4 100644
@@ -555,7 +555,12 @@ static void trie_free(struct bpf_map *map)
struct lpm_trie_node __rcu **slot;
struct lpm_trie_node *node;
+ /* at this point bpf_prog->aux->refcnt == 0 and this map->refcnt == 0,
+ * so the programs (can be more than one that used this map) were
+ * disconnected from events. Wait for outstanding programs to complete
+ * update/lookup/delete/get_next_key and free the trie.
I think the first part of the comment is slightly misleading, e.g. map refcount
could drop to zero also for various other reasons, not strictly due to prog
refcnt dropping to 0, so I would just keep the second part with 'Wait for
Oh, yes. Make sense. Copy-paste without thinking.
We have similar comments in virtually all other places of using
synchronize_rcu under kernel/bpf. This can be cleaned up later though.
outstanding [...]' which is okay since this is eventually what is relevant in
/* Always start at the root and walk down to a node that has no
* children. Then free that node, nullify its reference in the parent
@@ -588,7 +593,7 @@ static void trie_free(struct bpf_map *map)
Could you do a minor change here: since we get rid of the locking as there's
no user left anymore after grace period, it would be great if you could also
change the 'unlock' label name above.
Other than that, good to go, thanks!
static int trie_get_next_key(struct bpf_map *map, void *_key, void *_next_key)