colin-r-schultz opened a new issue, #45371:
URL: https://github.com/apache/arrow/issues/45371
### Describe the bug, including details regarding any error messages,
version, and platform.
The following example test case, when run under TSAN, reports a data race.
```cpp
TEST_F(TestRecordBatch, ColumnsThreadSafety) {
const int length = 10;
random::RandomArrayGenerator gen(42);
std::shared_ptr<ArrayData> array_data = gen.ArrayOf(utf8(),
length)->data();
auto schema = ::arrow::schema({field("f1", utf8())});
auto record_batch = RecordBatch::Make(schema, length, {array_data});
std::atomic_bool start_flag{false};
std::thread t([record_batch, &start_flag]() {
start_flag.store(true);
auto columns = record_batch->columns();
ASSERT_EQ(columns.size(), 1);
});
// Wait for thread startup
while (!start_flag.load()) {
};
auto columns = record_batch->columns();
ASSERT_EQ(columns.size(), 1);
t.join();
}
```
The relevant definitions in `record_batch.cc` are below
```cpp
const std::vector<std::shared_ptr<Array>>& columns() const override {
for (int i = 0; i < num_columns(); ++i) {
// Force all columns to be boxed
column(i);
}
return boxed_columns_;
}
std::shared_ptr<Array> column(int i) const override {
std::shared_ptr<Array> result = std::atomic_load(&boxed_columns_[i]);
if (!result) {
result = MakeArray(columns_[i]);
std::atomic_store(&boxed_columns_[i], result);
}
return result;
}
```
The `columns()` method returns a reference to `mutable boxed_columns_`,
assuming that it is fully initialized and will not be written to again.
However, multiple threads can race to initialize `boxed_columns_[i]`, leading
to additional atomic writes after `column(i)` has been called for the first
time. These atomic writes can race against non atomic reads of the
`boxed_columns_` vector after it is returned by `columns()`. This is undefined
behavior and can lead to a use-after-free of the contained `Array`s.
### Component(s)
C++
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]