This is an automated email from the ASF dual-hosted git repository.

apitrou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/main by this push:
     new 98620f54a7 GH-48134: [C++] Make StructArray::field() thread-safe 
(#48128)
98620f54a7 is described below

commit 98620f54a742d06e799aca02622cdbc67e4cc916
Author: tobim <[email protected]>
AuthorDate: Thu Nov 20 22:49:33 2025 +0100

    GH-48134: [C++] Make StructArray::field() thread-safe (#48128)
    
    ### Rationale for this change
    
    Before this change it was possible for two threads calling `field()` with 
the same index at the same time to cause a race on the stored entry in 
`boxed_fields_`. I.e. if a second thread goes into the path that calls 
`MakeArray` before the first thread stored its own new array, the second thread 
would also write to the same `shared_ptr` and invalidate the `shared_ptr` from 
the first thread, thereby also invalidating the returned reference.
    
    ### What changes are included in this PR?
    
    This PR changes the return type of `StructArray::field()` from 
`shared_ptr<Array>&` to `shared_ptr<Array>` giving the caller co-ownership of 
the data and safeguarding against any potential concurrent writes to the 
underlying `boxed_fields_` vector.
    It also changes the body to use the CAS pattern to avoid multiple 
concurrent writes to the same address.
    
    ### Are these changes tested?
    
    I don't know how to write a deterministic test that triggers the issue 
before the fix. Even a non-deterministic test needs to run with address 
sanitizer or valgrind or something similar.
    
    I can however confirm that this change fixes an issue that I've been 
debugging in https://github.com/tenzir/tenzir.
    
    ### Are there any user-facing changes?
    
    While changing `StructArray::field()` to return by value is an API change, 
I believe this should be compatible with regular uses of that function.
    
    * GitHub Issue: #48134
    
    Authored-by: Tobias Mayer <[email protected]>
    Signed-off-by: Antoine Pitrou <[email protected]>
---
 cpp/src/arrow/array/array_nested.cc | 29 +++++++++++++++++------------
 cpp/src/arrow/array/array_nested.h  |  2 +-
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/cpp/src/arrow/array/array_nested.cc 
b/cpp/src/arrow/array/array_nested.cc
index b821451419..b7f1686079 100644
--- a/cpp/src/arrow/array/array_nested.cc
+++ b/cpp/src/arrow/array/array_nested.cc
@@ -1076,20 +1076,25 @@ const ArrayVector& StructArray::fields() const {
   return boxed_fields_;
 }
 
-const std::shared_ptr<Array>& StructArray::field(int i) const {
+std::shared_ptr<Array> StructArray::field(int i) const {
   std::shared_ptr<Array> result = std::atomic_load(&boxed_fields_[i]);
-  if (!result) {
-    std::shared_ptr<ArrayData> field_data;
-    if (data_->offset != 0 || data_->child_data[i]->length != data_->length) {
-      field_data = data_->child_data[i]->Slice(data_->offset, data_->length);
-    } else {
-      field_data = data_->child_data[i];
-    }
-    result = MakeArray(field_data);
-    std::atomic_store(&boxed_fields_[i], std::move(result));
-    return boxed_fields_[i];
+  if (result) {
+    return result;
   }
-  return boxed_fields_[i];
+  std::shared_ptr<ArrayData> field_data;
+  if (data_->offset != 0 || data_->child_data[i]->length != data_->length) {
+    field_data = data_->child_data[i]->Slice(data_->offset, data_->length);
+  } else {
+    field_data = data_->child_data[i];
+  }
+  result = MakeArray(field_data);
+  // Check if some other thread inserted the array in the meantime and return
+  // that in that case.
+  std::shared_ptr<Array> expected = nullptr;
+  if (!std::atomic_compare_exchange_strong(&boxed_fields_[i], &expected, 
result)) {
+    result = std::move(expected);
+  }
+  return result;
 }
 
 std::shared_ptr<Array> StructArray::GetFieldByName(const std::string& name) 
const {
diff --git a/cpp/src/arrow/array/array_nested.h 
b/cpp/src/arrow/array/array_nested.h
index f122f9378b..2591fdaf41 100644
--- a/cpp/src/arrow/array/array_nested.h
+++ b/cpp/src/arrow/array/array_nested.h
@@ -692,7 +692,7 @@ class ARROW_EXPORT StructArray : public Array {
   // Return a shared pointer in case the requestor desires to share ownership
   // with this array.  The returned array has its offset, length and null
   // count adjusted.
-  const std::shared_ptr<Array>& field(int pos) const;
+  std::shared_ptr<Array> field(int pos) const;
 
   const ArrayVector& fields() const;
 

Reply via email to