This is an automated email from the ASF dual-hosted git repository.

yibocai pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 0d9d132e91 GH-33655: [C++][Parquet] Fix occasional failure in 
TestArrowReadWrite.MultithreadedWrite (#33739)
0d9d132e91 is described below

commit 0d9d132e9140f26578369b5ef0b44d25c501e45d
Author: Gang Wu <[email protected]>
AuthorDate: Sat Jan 21 10:34:28 2023 +0800

    GH-33655: [C++][Parquet] Fix occasional failure in 
TestArrowReadWrite.MultithreadedWrite (#33739)
    
    ### Rationale for this change
    
    This 
[commit](https://github.com/apache/arrow/commit/c8d6110a26c41966e539e9fa2f5cb8c31dc2f0fe)
 implements parallel column writing in the parquet writer. However, occasional 
test failure was observed from unit test 
`TestArrowReadWrite.MultithreadedWrite`. The root cause is an unintentional 
call of the copy constructor of `ArrowWriteContext` which results in the buffer 
sharing across all threads.
    
    ### What changes are included in this PR?
    
    This issue is fixed by inserting each context individually to avoid sharing 
buffers.
    
    ### Are these changes tested?
    
    This issue was observed by occasional 
`TestArrowReadWrite.MultithreadedWrite`. Make sure the test is recovered.
    
    Authored-by: Gang Wu <[email protected]>
    Signed-off-by: Yibo Cai <[email protected]>
---
 cpp/src/parquet/arrow/writer.cc | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/cpp/src/parquet/arrow/writer.cc b/cpp/src/parquet/arrow/writer.cc
index 9b51407234..6d22f318f6 100644
--- a/cpp/src/parquet/arrow/writer.cc
+++ b/cpp/src/parquet/arrow/writer.cc
@@ -291,8 +291,13 @@ class FileWriterImpl : public FileWriter {
         arrow_properties_(std::move(arrow_properties)),
         closed_(false) {
     if (arrow_properties_->use_threads()) {
-      parallel_column_write_contexts_.resize(schema_->num_fields(),
-                                             {pool, arrow_properties_.get()});
+      parallel_column_write_contexts_.reserve(schema_->num_fields());
+      for (int i = 0; i < schema_->num_fields(); ++i) {
+        // Explicitly create each ArrowWriteContext object to avoid 
unintentional
+        // call of the copy constructor. Otherwise, the buffers in the type of
+        // sharad_ptr will be shared among all contexts.
+        parallel_column_write_contexts_.emplace_back(pool, 
arrow_properties_.get());
+      }
     }
   }
 

Reply via email to