This is an automated email from the ASF dual-hosted git repository.
apitrou pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/main by this push:
new 98620f54a7 GH-48134: [C++] Make StructArray::field() thread-safe
(#48128)
98620f54a7 is described below
commit 98620f54a742d06e799aca02622cdbc67e4cc916
Author: tobim <[email protected]>
AuthorDate: Thu Nov 20 22:49:33 2025 +0100
GH-48134: [C++] Make StructArray::field() thread-safe (#48128)
### Rationale for this change
Before this change it was possible for two threads calling `field()` with
the same index at the same time to cause a race on the stored entry in
`boxed_fields_`. I.e. if a second thread goes into the path that calls
`MakeArray` before the first thread stored its own new array, the second thread
would also write to the same `shared_ptr` and invalidate the `shared_ptr` from
the first thread, thereby also invalidating the returned reference.
### What changes are included in this PR?
This PR changes the return type of `StructArray::field()` from
`shared_ptr<Array>&` to `shared_ptr<Array>` giving the caller co-ownership of
the data and safeguarding against any potential concurrent writes to the
underlying `boxed_fields_` vector.
It also changes the body to use the CAS pattern to avoid multiple
concurrent writes to the same address.
### Are these changes tested?
I don't know how to write a deterministic test that triggers the issue
before the fix. Even a non-deterministic test needs to run with address
sanitizer or valgrind or something similar.
I can however confirm that this change fixes an issue that I've been
debugging in https://github.com/tenzir/tenzir.
### Are there any user-facing changes?
While changing `StructArray::field()` to return by value is an API change,
I believe this should be compatible with regular uses of that function.
* GitHub Issue: #48134
Authored-by: Tobias Mayer <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
---
cpp/src/arrow/array/array_nested.cc | 29 +++++++++++++++++------------
cpp/src/arrow/array/array_nested.h | 2 +-
2 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/cpp/src/arrow/array/array_nested.cc
b/cpp/src/arrow/array/array_nested.cc
index b821451419..b7f1686079 100644
--- a/cpp/src/arrow/array/array_nested.cc
+++ b/cpp/src/arrow/array/array_nested.cc
@@ -1076,20 +1076,25 @@ const ArrayVector& StructArray::fields() const {
return boxed_fields_;
}
-const std::shared_ptr<Array>& StructArray::field(int i) const {
+std::shared_ptr<Array> StructArray::field(int i) const {
std::shared_ptr<Array> result = std::atomic_load(&boxed_fields_[i]);
- if (!result) {
- std::shared_ptr<ArrayData> field_data;
- if (data_->offset != 0 || data_->child_data[i]->length != data_->length) {
- field_data = data_->child_data[i]->Slice(data_->offset, data_->length);
- } else {
- field_data = data_->child_data[i];
- }
- result = MakeArray(field_data);
- std::atomic_store(&boxed_fields_[i], std::move(result));
- return boxed_fields_[i];
+ if (result) {
+ return result;
}
- return boxed_fields_[i];
+ std::shared_ptr<ArrayData> field_data;
+ if (data_->offset != 0 || data_->child_data[i]->length != data_->length) {
+ field_data = data_->child_data[i]->Slice(data_->offset, data_->length);
+ } else {
+ field_data = data_->child_data[i];
+ }
+ result = MakeArray(field_data);
+ // Check if some other thread inserted the array in the meantime and return
+ // that in that case.
+ std::shared_ptr<Array> expected = nullptr;
+ if (!std::atomic_compare_exchange_strong(&boxed_fields_[i], &expected,
result)) {
+ result = std::move(expected);
+ }
+ return result;
}
std::shared_ptr<Array> StructArray::GetFieldByName(const std::string& name)
const {
diff --git a/cpp/src/arrow/array/array_nested.h
b/cpp/src/arrow/array/array_nested.h
index f122f9378b..2591fdaf41 100644
--- a/cpp/src/arrow/array/array_nested.h
+++ b/cpp/src/arrow/array/array_nested.h
@@ -692,7 +692,7 @@ class ARROW_EXPORT StructArray : public Array {
// Return a shared pointer in case the requestor desires to share ownership
// with this array. The returned array has its offset, length and null
// count adjusted.
- const std::shared_ptr<Array>& field(int pos) const;
+ std::shared_ptr<Array> field(int pos) const;
const ArrayVector& fields() const;