A race between the sessiond tear down and applications initialization can lead to a deadlock.
Applications try to communicate via the notify sockets while sessiond does not listen anymore on these sockets since the thread responsible for reception/response is terminated (ust_thread_manage_notify). These sockets are never closed hence an application could hang on communication. Sessiond hang happen during call to cmd_destroy_session during sessiond_cleanup. Sessiond is trying to communicate with the app while the app is waiting for a response on the app notification socket. To prevent this situation a call to ust_app_notify_sock_unregister is performed on all entry of the ust_app_ht_by_notify_sock hash table at the time of termination. This ensure that any pending communication initiated by the application will be terminated since all sockets will be closed at the end of the grace period via call_rcu inside ust_app_notify_sock_unregister. The use of ust_app_ht_by_notify_sock instead of the ust_app_ht prevent a double call_rcu since entries are removed from ust_app_ht_by_notify_sock during ust_app_notify_sock_unregister. This can be reproduced using the sessiond_teardown_active_session scenario provided by [1]. [1] https://github.com/PSRCode/lttng-stress Signed-off-by: Jonathan Rajotte <[email protected]> --- src/bin/lttng-sessiond/ust-thread.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/src/bin/lttng-sessiond/ust-thread.c b/src/bin/lttng-sessiond/ust-thread.c index 1e7a8229..8f11133a 100644 --- a/src/bin/lttng-sessiond/ust-thread.c +++ b/src/bin/lttng-sessiond/ust-thread.c @@ -27,6 +27,19 @@ #include "health-sessiond.h" #include "testpoint.h" + +static +void notify_sock_unregister_all() +{ + struct lttng_ht_iter iter; + struct ust_app *app; + rcu_read_lock(); + cds_lfht_for_each_entry(ust_app_ht_by_notify_sock->ht, &iter.iter, app, notify_sock_n.node) { + ust_app_notify_sock_unregister(app->notify_sock); + } + rcu_read_unlock(); +} + /* * This thread manage application notify communication. */ @@ -53,7 +66,7 @@ void *ust_thread_manage_notify(void *data) ret = lttng_poll_create(&events, 2, LTTNG_CLOEXEC); if (ret < 0) { - goto error; + goto error_poll_create; } /* Add quit pipe */ @@ -197,6 +210,8 @@ error_poll_create: error_testpoint: utils_close_pipe(apps_cmd_notify_pipe); apps_cmd_notify_pipe[0] = apps_cmd_notify_pipe[1] = -1; + notify_sock_unregister_all(); + DBG("Application notify communication apps thread cleanup complete"); if (err) { health_error(); -- 2.11.0 _______________________________________________ lttng-dev mailing list [email protected] https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
