justintime4tea opened a new issue, #24919: URL: https://github.com/apache/pulsar/issues/24919
### Search before reporting - [x] I searched in the [issues](https://github.com/apache/pulsar/issues) and found nothing similar. ### What issue do you find in Pulsar docs? Ran into a rather nasty to debug bug when writing integration tests for a node-js application that uses "pulsar-client" NPM package. Essentially, the issue is though it seems like the "pulsar-client" supports multiple instances of a Pulsar client, the C++ side of things doesn't fully isolate these instances and a single `static LogCallback *logCallback` is created for ALL clients with each client overwriting this static callback with whatever handler the last executed `setLogCallbackHandler` or `Pulsar.Client` constructor is given. This isn't really documented other than some comments I see in the C++ code. ## Why this is an issue This `static LogCallback *logCallback` being shared results in a potential race-condition whereby some final message from one of the multiple C++ pulsar client is attempted to be logged and a crash occurs. Since neither the `Close` or `~Client()` destructor clear/set `static Client::logCallback` to null on the C++ side, it's possible that `static Client::logCallback` becomes a dangling pointer pointing to a `ThreadSafeFunction` that was part of a previous client, which has since been destroyed. ## Repro **Looking at the Client::Close method and the Client::~Client destructor:** - Client::~Client() {} (The destructor is empty) - Napi::Value Client::Close(const Napi::CallbackInfo &info) (This function only calls pulsar_client_close_async. It does not touch Client::logCallback.) **Conceptual repro steps:** 1. You create Client A, which sets the static Client::logCallback to point to its logger. 2. You create Client B, which replaces the static Client::logCallback to point to its logger. (Client A's original logger function is now abandoned). 3. You call await ClientB.close(). The Close method is called, and eventually ~Client() is called. 4. Neither Close nor the destructor clears the logCallback. 5. logCallback is now a dangling pointer, pointing to a ThreadSafeFunction that was part of Client B, which is now destroyed. 6. A background thread from Client A tries to log a message. It uses the static logCallback, accesses the dangling pointer, and the process crashes. ### What is your suggestion? I suggest that, at bare minimum and baring the C++ code being "corrected", documentation is updated so that consumers of pulsar client libraries are made aware of this. I'm not sure exactly how, because it doesn't seem like you'd be safe even just setting the hanlder a single time due to the nature of the the "close" and "destructor" code in C++. ### Any reference? **Diagnostics** Though I don't have a code repro I do have a breakdown of how I stumbled upon this - In integration tests where there are many pulsar clients being created and closed, along with the underlying Node JS app being tested creating/closing it's own pulsar client, the race condition is triggered and a SIGABRT is displayed near the end of the test output with the following message: ``` libc++abi: terminating due to uncaught exception of type Napi::Error: ``` Running the tests using lldb to capture the coredump results in the following which lead me to the issue: ``` frame #8: 0x00000001922fbbd8 libc++abi.dylib`__cxa_throw + 92 frame #9: 0x0000000151a485d4 pulsar.node`Napi::Function::Call(std::initializer_list<napi_value__*> const&) const + 196 frame #10: 0x0000000151a4fc10 pulsar.node`LogMessageProxy(Napi::Env, Napi::Function, LogMessage*) + 272 frame #11: 0x0000000151a4c1ec pulsar.node`Napi::ThreadSafeFunction::CallJS(napi_env__*, napi_value__*, void*, void*) + 68 frame #12: 0x00000001000c6b94 node`v8impl::(anonymous namespace)::ThreadSafeFunction::AsyncCb(uv_async_s*) + 472 ``` For whatever reason, this only occurs on node versions below v24. I tested in many v22.x versions and v23.x versions but I can't seem to reproduce this issue with node v24. The full dump for thread #1: ``` * thread #1 * frame #0: 0x000000019230a388 libsystem_kernel.dylib`__pthread_kill + 8 frame #1: 0x000000019234388c libsystem_pthread.dylib`pthread_kill + 296 frame #2: 0x000000019224ca3c libsystem_c.dylib`abort + 124 frame #3: 0x00000001922f9384 libc++abi.dylib`abort_message + 132 frame #4: 0x00000001922e7cd8 libc++abi.dylib`demangling_terminate_handler() + 316 frame #5: 0x0000000191f6cde4 libobjc.A.dylib`_objc_terminate() + 172 frame #6: 0x00000001922f8698 libc++abi.dylib`std::__terminate(void (*)()) + 16 frame #7: 0x00000001922fbc30 libc++abi.dylib`__cxxabiv1::failed_throw(__cxxabiv1::__cxa_exception*) + 88 frame #8: 0x00000001922fbbd8 libc++abi.dylib`__cxa_throw + 92 frame #9: 0x0000000151a485d4 pulsar.node`Napi::Function::Call(std::initializer_list<napi_value__*> const&) const + 196 frame #10: 0x0000000151a4fc10 pulsar.node`LogMessageProxy(Napi::Env, Napi::Function, LogMessage*) + 272 frame #11: 0x0000000151a4c1ec pulsar.node`Napi::ThreadSafeFunction::CallJS(napi_env__*, napi_value__*, void*, void*) + 68 frame #12: 0x00000001000c6b94 node`v8impl::(anonymous namespace)::ThreadSafeFunction::AsyncCb(uv_async_s*) + 472 frame #13: 0x0000000100f01478 node`uv__async_io + 304 frame #14: 0x0000000100f1524c node`uv__io_poll + 1512 frame #15: 0x0000000100f019e0 node`uv_run + 408 frame #16: 0x0000000100093004 node`node::Environment::CleanupHandles() + 188 frame #17: 0x000000010009377c node`node::Environment::RunCleanup() + 292 frame #18: 0x0000000100009ff0 node`node::FreeEnvironment(node::Environment*) + 128 frame #19: 0x000000010014d26c node`node::NodeMainInstance::Run() + 304 frame #20: 0x00000001000bf5f4 node`node::Start(int, char**) + 600 frame #21: 0x0000000191fa2b98 dyld`start + 6076 ``` ### Are you willing to submit a PR? - [ ] I'm willing to submit a PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
