https://github.com/JDevlieghere created https://github.com/llvm/llvm-project/pull/137920
We've gotten multiple reports of the launch and attach test being flaky, both in CI and locally when running the tests. I believe the flakiness is due to a race between the main thread and the event handler thread. Launching and attaching is done in synchronous mode so the corresponding requests return only after the respective operation has been completed. In synchronous mode, no stop event is emitted. When we have launch or attach commands, we cannot use synchronous mode. To hide the stop events in this case, the default event handler thread was ignoring stop events before the "configuration done" request was answered. The problem with that is that there's no guarantee that we have handled the stop event before we have handled the configuration done request. When looking at the logs, you can see that we're still in the process of sending module events (which I recently added) when we receive, and respond to the "configuration done" request, before it sees the launch or attach stop event. At that point `dap.configuration_done_sent` is true and the event is sent, which the test doesn't expect. This PR fixes the raciness by using an atomic flag to signal that the next stop event should be ignored. An alternative approach could be to stop trying to hide the initial stop event, and instead report it to the client unconditionally. Instead of ignoring the stop for the asynchronous case, we could send a stop event after we're done handling the synchronous case. Fixes #137660 >From bedc08b135c4eb7817b0472fe71a9c201161b430 Mon Sep 17 00:00:00 2001 From: Jonas Devlieghere <jo...@devlieghere.com> Date: Tue, 29 Apr 2025 21:14:28 -0700 Subject: [PATCH] [lldb-dap] Fix raciness in launch and attach tests We've gotten multiple reports of the launch and attach test being flaky, both in CI and locally when running the tests. I believe the flakiness is due to a race between the main thread and the event handler thread. Launching and attaching is done in synchronous mode so the corresponding requests return only after the respective operation has been completed. In synchronous mode, no stop event is emitted. When we have launch or attach commands, we cannot use synchronous mode. To hide the stop events in this case, the default event handler thread was ignoring stop events before the "configuration done" request was answered. The problem with that is that there's no guarantee that we have handled the stop event before we have handled the configuration done request. When looking at the logs, you can see that we're still in the process of sending module events (which I recently added) when we receive, and respond to the "configuration done" request, before it sees the launch or attach stop event. At that point `dap.configuration_done_sent` is true and the event is sent, which the test doesn't expect. This PR fixes the raciness by using an atomic flag to signal that the next stop event should be ignored. An alternative approach could be to stop trying to hide the initial stop event, and instead report it to the client unconditionally. Instead of ignoring the stop for the asynchronous case, we could send a stop event after we're done handling the synchronous case. Fixes #137660 --- .../Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py | 2 +- lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py | 1 - .../test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py | 1 - lldb/tools/lldb-dap/DAP.cpp | 3 ++- lldb/tools/lldb-dap/DAP.h | 1 + lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp | 4 ++++ lldb/tools/lldb-dap/Handler/InitializeRequestHandler.cpp | 2 +- lldb/tools/lldb-dap/Handler/RequestHandler.cpp | 1 + 8 files changed, 10 insertions(+), 5 deletions(-) diff --git a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py index ee5272850b9a8..0d81b7d80102f 100644 --- a/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py +++ b/lldb/packages/Python/lldbsuite/test/tools/lldb-dap/lldbdap_testcase.py @@ -103,7 +103,7 @@ def verify_breakpoint_hit(self, breakpoint_ids): match_desc = "breakpoint %s." % (breakpoint_id) if match_desc in description: return - self.assertTrue(False, "breakpoint not hit") + self.assertTrue(False, f"breakpoint not hit: stopped_events={stopped_events}") def verify_stop_exception_info(self, expected_description, timeout=timeoutval): """Wait for the process we are debugging to stop, and verify the stop diff --git a/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py b/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py index 6f70316821c8c..dcdfada2ff4c2 100644 --- a/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py +++ b/lldb/test/API/tools/lldb-dap/attach/TestDAP_attach.py @@ -25,7 +25,6 @@ def spawn_and_wait(program, delay): process.wait() -@skipIf class TestDAP_attach(lldbdap_testcase.DAPTestCaseBase): def set_and_hit_breakpoint(self, continueToExit=True): source = "main.c" diff --git a/lldb/test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py b/lldb/test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py index 51f62b79f3f4f..152e504af6d14 100644 --- a/lldb/test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py +++ b/lldb/test/API/tools/lldb-dap/attach/TestDAP_attachByPortNum.py @@ -19,7 +19,6 @@ import socket -@skip class TestDAP_attachByPortNum(lldbdap_testcase.DAPTestCaseBase): default_timeout = 20 diff --git a/lldb/tools/lldb-dap/DAP.cpp b/lldb/tools/lldb-dap/DAP.cpp index b593353110787..e191d8f2d3745 100644 --- a/lldb/tools/lldb-dap/DAP.cpp +++ b/lldb/tools/lldb-dap/DAP.cpp @@ -85,7 +85,8 @@ DAP::DAP(Log *log, const ReplMode default_repl_mode, exception_breakpoints(), focus_tid(LLDB_INVALID_THREAD_ID), stop_at_entry(false), is_attach(false), restarting_process_id(LLDB_INVALID_PROCESS_ID), - configuration_done_sent(false), waiting_for_run_in_terminal(false), + configuration_done_sent(false), ignore_next_stop(false), + waiting_for_run_in_terminal(false), progress_event_reporter( [&](const ProgressEvent &event) { SendJSON(event.ToJSON()); }), reverse_request_seq(0), repl_mode(default_repl_mode) { diff --git a/lldb/tools/lldb-dap/DAP.h b/lldb/tools/lldb-dap/DAP.h index 88eedb0860cf1..40d6400765a8d 100644 --- a/lldb/tools/lldb-dap/DAP.h +++ b/lldb/tools/lldb-dap/DAP.h @@ -189,6 +189,7 @@ struct DAP { // the old process here so we can detect this case and keep running. lldb::pid_t restarting_process_id; bool configuration_done_sent; + std::atomic<bool> ignore_next_stop; llvm::StringMap<std::unique_ptr<BaseRequestHandler>> request_handlers; bool waiting_for_run_in_terminal; ProgressEventReporter progress_event_reporter; diff --git a/lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp b/lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp index 3ef87cbef873c..5778ae53c9b0b 100644 --- a/lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp +++ b/lldb/tools/lldb-dap/Handler/AttachRequestHandler.cpp @@ -155,6 +155,9 @@ void AttachRequestHandler::operator()(const llvm::json::Object &request) const { std::string connect_url = llvm::formatv("connect://{0}:", gdb_remote_hostname); connect_url += std::to_string(gdb_remote_port); + // Connect remote will generate a stopped event even in synchronous + // mode. + dap.ignore_next_stop = true; dap.target.ConnectRemote(listener, connect_url.c_str(), "gdb-remote", error); } else { @@ -166,6 +169,7 @@ void AttachRequestHandler::operator()(const llvm::json::Object &request) const { // Reenable async events dap.debugger.SetAsync(true); } else { + dap.ignore_next_stop = true; // We have "attachCommands" that are a set of commands that are expected // to execute the commands after which a process should be created. If there // is no valid process after running these commands, we have failed. diff --git a/lldb/tools/lldb-dap/Handler/InitializeRequestHandler.cpp b/lldb/tools/lldb-dap/Handler/InitializeRequestHandler.cpp index ce34c52bcc334..c53d7d5e2febf 100644 --- a/lldb/tools/lldb-dap/Handler/InitializeRequestHandler.cpp +++ b/lldb/tools/lldb-dap/Handler/InitializeRequestHandler.cpp @@ -167,7 +167,7 @@ static void EventThreadFunction(DAP &dap) { // stop events which we do not want to send an event for. We will // manually send a stopped event in request_configurationDone(...) // so don't send any before then. - if (dap.configuration_done_sent) { + if (!dap.ignore_next_stop.exchange(false)) { // Only report a stopped event if the process was not // automatically restarted. if (!lldb::SBProcess::GetRestartedFromEvent(event)) { diff --git a/lldb/tools/lldb-dap/Handler/RequestHandler.cpp b/lldb/tools/lldb-dap/Handler/RequestHandler.cpp index b7d3c8ced69f1..e7601a929aa6c 100644 --- a/lldb/tools/lldb-dap/Handler/RequestHandler.cpp +++ b/lldb/tools/lldb-dap/Handler/RequestHandler.cpp @@ -250,6 +250,7 @@ llvm::Error BaseRequestHandler::LaunchProcess( if (error.Fail()) return llvm::make_error<DAPError>(error.GetCString()); } else { + dap.ignore_next_stop = true; // Set the launch info so that run commands can access the configured // launch details. dap.target.SetLaunchInfo(launch_info); _______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits