kou commented on PR #41163: URL: https://github.com/apache/arrow/pull/41163#issuecomment-2053787467
I've also investigated this more. I think that `--with-tls` isn't the cause of this. I could reproduce this by the following program that only uses `libxml2` and `std::thread`: ```cxx // g++ -fsanitize=address -g3 -O0 -o aaa aaa.cxx $(pkg-config --cflags --libs libxml-2.0) && ./aaa #include <chrono> #include <condition_variable> #include <mutex> #include <thread> #include <libxml/xmlreader.h> int main(void) { xmlInitParser(); std::mutex mutex; std::condition_variable variable; std::thread thread([&] { xmlTextReaderPtr reader = xmlReaderForMemory("<root/>", 7, nullptr, nullptr, 0); xmlFreeTextReader(reader); variable.notify_one(); { std::unique_lock<std::mutex> lock(mutex); variable.wait(lock); } }); { std::unique_lock<std::mutex> lock(mutex); variable.wait(lock); } xmlCleanupParser(); variable.notify_one(); thread.join(); return 0; } ``` The point of this program is that a thread that uses libxml2 API is finished after `xmlCleanupParser()` is called. This is also happen in our test. A thread created by `arrow::internal::ThreadPool` is finished after a `xmlCleanupParser()` call that is caused in Azure SDK for C++. ( https://github.com/Azure/azure-sdk-for-cpp/blob/067d6acb3b2d1f82b5ad9d258050bc525941d501/sdk/storage/azure-storage-common/src/xml_wrapper.cpp#L398 ) Note that both of them are happen by a C++ destructor. They are called on process exit. `Azure::Storage::_internal::XmlGlobalInitializer::~XmlGlobalInitializer()` is called before `arrow::internal::ThreadPool::~ThreadPool` of `arrow::io::default_io_context()`. If we shutdown a thread of `arrow::internal::ThreadPool` explicitly before `Azure::Storage::_internal::XmlGlobalInitializer::~XmlGlobalInitializer()`, the leak isn't reported. For example: ```diff diff --git a/cpp/src/arrow/filesystem/azurefs_test.cc b/cpp/src/arrow/filesystem/azurefs_test.cc index ed09bfc2fa..d87e3e0731 100644 --- a/cpp/src/arrow/filesystem/azurefs_test.cc +++ b/cpp/src/arrow/filesystem/azurefs_test.cc @@ -58,6 +58,7 @@ #include "arrow/util/logging.h" #include "arrow/util/pcg_random.h" #include "arrow/util/string.h" +#include "arrow/util/thread_pool.h" #include "arrow/util/unreachable.h" #include "arrow/util/value_parsing.h" @@ -371,6 +372,8 @@ class TestGeneric : public ::testing::Test, public GenericFileSystemTest { if (azure_fs_) { ASSERT_OK(azure_fs_->DeleteDir(container_name_)); } + // Dirty + ASSERT_OK(reinterpret_cast<::arrow::internal::ThreadPool *>(io_context_->executor())->Shutdown()); } protected: @@ -379,7 +382,8 @@ class TestGeneric : public ::testing::Test, public GenericFileSystemTest { random::pcg32_fast rng((std::random_device()())); container_name_ = PreexistingData::RandomContainerName(rng); ASSERT_OK_AND_ASSIGN(auto options, MakeOptions(env_)); - ASSERT_OK_AND_ASSIGN(azure_fs_, AzureFileSystem::Make(options)); + io_context_ = std::make_unique<io::IOContext>(); + ASSERT_OK_AND_ASSIGN(azure_fs_, AzureFileSystem::Make(options, *io_context_)); ASSERT_OK(azure_fs_->CreateDir(container_name_, true)); fs_ = std::make_shared<SubTreeFileSystem>(container_name_, azure_fs_); } @@ -417,6 +421,7 @@ class TestGeneric : public ::testing::Test, public GenericFileSystemTest { private: std::string container_name_; + std::unique_ptr<io::IOContext> io_context_; }; class TestAzuriteGeneric : public TestGeneric { ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org