kou commented on PR #41163:
URL: https://github.com/apache/arrow/pull/41163#issuecomment-2053787467

   I've also investigated this more.
   
   I think that `--with-tls` isn't the cause of this.
   
   I could reproduce this by the following program that only uses `libxml2` and 
`std::thread`:
   
   ```cxx
   // g++ -fsanitize=address -g3 -O0 -o aaa aaa.cxx $(pkg-config --cflags 
--libs libxml-2.0) && ./aaa
   
   #include <chrono>
   #include <condition_variable>
   #include <mutex>
   #include <thread>
   
   #include <libxml/xmlreader.h>
   
   int main(void) {
     xmlInitParser();
     std::mutex mutex;
     std::condition_variable variable;
     std::thread thread([&] {
       xmlTextReaderPtr reader =
         xmlReaderForMemory("<root/>", 7, nullptr, nullptr, 0);
       xmlFreeTextReader(reader);
       variable.notify_one();
       {
         std::unique_lock<std::mutex> lock(mutex);
         variable.wait(lock);
       }
     });
     {
       std::unique_lock<std::mutex> lock(mutex);
       variable.wait(lock);
     }
     xmlCleanupParser();
     variable.notify_one();
     thread.join();
     return 0;
   }
   ```
   
   The point of this program is that a thread that uses libxml2 API is finished 
after `xmlCleanupParser()` is called.
   
   This is also happen in our test. A thread created by 
`arrow::internal::ThreadPool` is finished after a `xmlCleanupParser()` call 
that is caused in Azure SDK for C++. ( 
https://github.com/Azure/azure-sdk-for-cpp/blob/067d6acb3b2d1f82b5ad9d258050bc525941d501/sdk/storage/azure-storage-common/src/xml_wrapper.cpp#L398
 )
   
   Note that both of them are happen by a C++ destructor. They are called on 
process exit. 
`Azure::Storage::_internal::XmlGlobalInitializer::~XmlGlobalInitializer()` is 
called before `arrow::internal::ThreadPool::~ThreadPool` of 
`arrow::io::default_io_context()`.
   
   If we shutdown a thread of `arrow::internal::ThreadPool` explicitly before 
`Azure::Storage::_internal::XmlGlobalInitializer::~XmlGlobalInitializer()`, the 
leak isn't reported. For example:
   
   ```diff
   diff --git a/cpp/src/arrow/filesystem/azurefs_test.cc 
b/cpp/src/arrow/filesystem/azurefs_test.cc
   index ed09bfc2fa..d87e3e0731 100644
   --- a/cpp/src/arrow/filesystem/azurefs_test.cc
   +++ b/cpp/src/arrow/filesystem/azurefs_test.cc
   @@ -58,6 +58,7 @@
    #include "arrow/util/logging.h"
    #include "arrow/util/pcg_random.h"
    #include "arrow/util/string.h"
   +#include "arrow/util/thread_pool.h"
    #include "arrow/util/unreachable.h"
    #include "arrow/util/value_parsing.h"
    
   @@ -371,6 +372,8 @@ class TestGeneric : public ::testing::Test, public 
GenericFileSystemTest {
        if (azure_fs_) {
          ASSERT_OK(azure_fs_->DeleteDir(container_name_));
        }
   +    // Dirty
   +    ASSERT_OK(reinterpret_cast<::arrow::internal::ThreadPool 
*>(io_context_->executor())->Shutdown());
      }
    
     protected:
   @@ -379,7 +382,8 @@ class TestGeneric : public ::testing::Test, public 
GenericFileSystemTest {
        random::pcg32_fast rng((std::random_device()()));
        container_name_ = PreexistingData::RandomContainerName(rng);
        ASSERT_OK_AND_ASSIGN(auto options, MakeOptions(env_));
   -    ASSERT_OK_AND_ASSIGN(azure_fs_, AzureFileSystem::Make(options));
   +    io_context_ = std::make_unique<io::IOContext>();
   +    ASSERT_OK_AND_ASSIGN(azure_fs_, AzureFileSystem::Make(options, 
*io_context_));
        ASSERT_OK(azure_fs_->CreateDir(container_name_, true));
        fs_ = std::make_shared<SubTreeFileSystem>(container_name_, azure_fs_);
      }
   @@ -417,6 +421,7 @@ class TestGeneric : public ::testing::Test, public 
GenericFileSystemTest {
    
     private:
      std::string container_name_;
   +  std::unique_ptr<io::IOContext> io_context_;
    };
    
    class TestAzuriteGeneric : public TestGeneric {
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to