pitrou commented on code in PR #45372:
URL: https://github.com/apache/arrow/pull/45372#discussion_r1934046857
##########
cpp/src/arrow/record_batch_test.cc:
##########
@@ -393,6 +399,27 @@ TEST_F(TestRecordBatch, RemoveColumnEmpty) {
AssertBatchesEqual(*added, *batch1);
}
+TEST_F(TestRecordBatch, ColumnsThreadSafety) {
+ const int length = 10;
+
+ random::RandomArrayGenerator gen(42);
+ std::shared_ptr<ArrayData> array_data = gen.ArrayOf(utf8(), length)->data();
+ auto schema = ::arrow::schema({field("f1", utf8())});
Review Comment:
There could also be several fields and the worker thread would call
`column(i)` several times with `i` being a random number. Something like
(untested):
```c++
constexpr int kNumFields = 40;
constexpr int kNumThreads = 50;
auto schema = ::arrow::schema(FieldVector(kNumFields, field("f1",
utf8())));
auto batch = RecordBatch::Make(schema, length, ArrayDataVector(kNumFields,
array_data));
std::random_device rd;
std::vector<std::threads> threads(kNumThreads);
for (auto& thread : threads) {
const auto seed = rd();
thread = std::thread([&]() {
std::default_engine rng(seed);
std::uniform_int_distribution<int> field_dist(0, kNumFields - 1);
for (int i = 0; i < kNumFields; ++i) {
ASSERT_NE(nullptr, batch->column(field_dist(rng)));
}
});
}
for (auto& thread : threads) {
thread.join();
}
```
##########
cpp/src/arrow/record_batch_test.cc:
##########
@@ -393,6 +399,27 @@ TEST_F(TestRecordBatch, RemoveColumnEmpty) {
AssertBatchesEqual(*added, *batch1);
}
+TEST_F(TestRecordBatch, ColumnsThreadSafety) {
+ const int length = 10;
+
+ random::RandomArrayGenerator gen(42);
+ std::shared_ptr<ArrayData> array_data = gen.ArrayOf(utf8(), length)->data();
+ auto schema = ::arrow::schema({field("f1", utf8())});
+ auto record_batch = RecordBatch::Make(schema, length, {array_data});
+ std::atomic_bool start_flag{false};
+ std::thread t([record_batch, &start_flag]() {
+ start_flag.store(true);
+ auto columns = record_batch->columns();
+ ASSERT_EQ(columns.size(), 1);
Review Comment:
Why this test? `boxed_columns_` is presized in the constructor, so this
should always succeed.
##########
cpp/src/arrow/record_batch_test.cc:
##########
@@ -393,6 +399,27 @@ TEST_F(TestRecordBatch, RemoveColumnEmpty) {
AssertBatchesEqual(*added, *batch1);
}
+TEST_F(TestRecordBatch, ColumnsThreadSafety) {
+ const int length = 10;
+
+ random::RandomArrayGenerator gen(42);
+ std::shared_ptr<ArrayData> array_data = gen.ArrayOf(utf8(), length)->data();
+ auto schema = ::arrow::schema({field("f1", utf8())});
+ auto record_batch = RecordBatch::Make(schema, length, {array_data});
+ std::atomic_bool start_flag{false};
+ std::thread t([record_batch, &start_flag]() {
+ start_flag.store(true);
+ auto columns = record_batch->columns();
+ ASSERT_EQ(columns.size(), 1);
+ });
+ // Wait for thread startup
+ while (!start_flag.load()) {
Review Comment:
This isn't even useful if there are several threads doings the same thing.
##########
cpp/src/arrow/record_batch_test.cc:
##########
@@ -393,6 +399,27 @@ TEST_F(TestRecordBatch, RemoveColumnEmpty) {
AssertBatchesEqual(*added, *batch1);
}
+TEST_F(TestRecordBatch, ColumnsThreadSafety) {
+ const int length = 10;
+
+ random::RandomArrayGenerator gen(42);
+ std::shared_ptr<ArrayData> array_data = gen.ArrayOf(utf8(), length)->data();
+ auto schema = ::arrow::schema({field("f1", utf8())});
+ auto record_batch = RecordBatch::Make(schema, length, {array_data});
+ std::atomic_bool start_flag{false};
+ std::thread t([record_batch, &start_flag]() {
Review Comment:
Yes, this should use several threads that would do the same thing
concurrently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]