benibus commented on code in PR #34537:
URL: https://github.com/apache/arrow/pull/34537#discussion_r1140678912
##########
cpp/src/arrow/type.cc:
##########
@@ -1159,6 +1161,20 @@ struct FieldPathGetImpl {
});
}
+ static Result<std::shared_ptr<ChunkedArray>> Get(const FieldPath* path,
+ const ChunkedArrayVector&
table) {
+ return FieldPathGetImpl::Get(
+ path, &table,
+ [](const std::shared_ptr<ChunkedArray>& data) -> const
ChunkedArrayVector * {
+ auto chunkedArrayVector = data->Flatten();
Review Comment:
We typically use `snake_case` for variable names - so
`chunked_array_vector`. That includes test code, so you might want to revise
the names there too.
##########
cpp/src/arrow/type_test.cc:
##########
@@ -379,6 +382,72 @@ TEST(TestFieldPath, Basics) {
FieldPath({s.num_fields() * 2}).Get(s));
}
+TEST(TestFieldPath, GetForTable) {
+ using testing::HasSubstr;
+
+ const int length = 100;
+ auto f0 = field("alpha", int32());
+ auto f1 = field("beta", int32());
+ auto f2 = field("alpha", int32());
+ auto f3 = field("beta", int32());
+ auto schema = arrow::schema({f0, f1, f2, f3});
+
+ arrow::random::RandomArrayGenerator gen_{42};
+ auto a0 = gen_.ArrayOf(int32(), length);
+ auto a1 = gen_.ArrayOf(int32(), length);
+ auto a2 = gen_.ArrayOf(int32(), length);
+ auto a3 = gen_.ArrayOf(int32(), length);
+ auto arrayVector = ArrayVector({a0, a1, a2, a3});
+
+ auto tablePtr = Table::Make(schema, {a0, a1, a2, a3});
Review Comment:
It would be worth throwing in a `ASSERT_OK(tablePtr->ValidateFull())` after
this (same for `RecordBatch::Make` i guess).
##########
cpp/src/arrow/type.cc:
##########
@@ -1159,6 +1161,20 @@ struct FieldPathGetImpl {
});
}
+ static Result<std::shared_ptr<ChunkedArray>> Get(const FieldPath* path,
+ const ChunkedArrayVector&
table) {
+ return FieldPathGetImpl::Get(
+ path, &table,
+ [](const std::shared_ptr<ChunkedArray>& data) -> const
ChunkedArrayVector * {
+ auto chunkedArrayVector = data->Flatten();
+ if (!chunkedArrayVector.ok()) {
+ return nullptr;
+ }
+
+ return &(chunkedArrayVector.ValueUnsafe());
Review Comment:
The returned pointer won't be valid after the call since
`chunkedArrayVector` gets freed once the function's scope exits. (If it doesn't
segfault locally, building with `ARROW_USE_ASAN` and `ARROW_USE_UBSAN` should
probably catch it).
Instead, you might want to pre-define the vector in the outer scope, then
capture it by reference in the lambda.
##########
cpp/src/arrow/type_test.cc:
##########
@@ -379,6 +382,72 @@ TEST(TestFieldPath, Basics) {
FieldPath({s.num_fields() * 2}).Get(s));
}
+TEST(TestFieldPath, GetForTable) {
Review Comment:
You should test nested fields here as well (i.e. struct fields in the
schema) - preferably with multiple child fields. That would probably expose the
aforementioned use-after-free issue but we're currently only testing the top
nesting level.
##########
cpp/src/arrow/type.cc:
##########
@@ -1159,6 +1161,20 @@ struct FieldPathGetImpl {
});
}
+ static Result<std::shared_ptr<ChunkedArray>> Get(const FieldPath* path,
+ const ChunkedArrayVector&
table) {
+ return FieldPathGetImpl::Get(
+ path, &table,
+ [](const std::shared_ptr<ChunkedArray>& data) -> const
ChunkedArrayVector * {
+ auto chunkedArrayVector = data->Flatten();
+ if (!chunkedArrayVector.ok()) {
+ return nullptr;
+ }
Review Comment:
We probably want to propagate a non-ok status here rather than implicitly
discard it. Perhaps you could capture a `Status` variable in the outer-scope
which you could check after calling `FieldPathGetImpl::Get`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]