jorisvandenbossche commented on code in PR #14495:
URL: https://github.com/apache/arrow/pull/14495#discussion_r1006672327
##########
cpp/src/arrow/compute/kernels/scalar_nested.cc:
##########
@@ -252,6 +266,39 @@ struct StructFieldFunctor {
return Status::OK();
}
+ static Result<std::shared_ptr<Array>> ApplyFieldRef(KernelContext* ctx,
+ const FieldRef&
field_ref,
+ std::shared_ptr<Array>
current) {
+ if (current->type_id() != Type::STRUCT) {
+ return Status::Invalid("Not a StructArray: ", current->ToString(),
+ "\nMaybe a bad FieldRef? ", field_ref.ToString());
+ }
+
+ if (field_ref.IsName()) {
+ const auto& array = checked_cast<const StructArray&>(*current);
+ current = array.GetFieldByName(*field_ref.name());
Review Comment:
I think `GetFieldByName` actually has the same concern as mentioned before
about not taking into account top-level nulls? (so also for this case can
convert to FieldPath with FindOne, and then this can be simplified a bit
further)
(if that's the case, that probably indicates the tests needs to be expanded
with a case of using a named FieldRef in the test with nulls)
##########
cpp/src/arrow/compute/kernels/scalar_nested.cc:
##########
@@ -271,11 +318,30 @@ struct StructFieldFunctor {
Result<TypeHolder> ResolveStructFieldType(KernelContext* ctx,
const std::vector<TypeHolder>&
types) {
- const auto& options = OptionsWrapper<StructFieldOptions>::Get(ctx);
+ const auto& field_ref =
OptionsWrapper<StructFieldOptions>::Get(ctx).field_ref;
const DataType* type = types.front().type;
- for (const auto& index : options.indices) {
- RETURN_NOT_OK(StructFieldFunctor::CheckIndex(index, *type));
- type = type->field(index)->type().get();
+
+ if (field_ref.IsName()) {
+ for (const auto& path : field_ref.FindAll(*type)) {
+ for (const auto& index : path.indices()) {
+ RETURN_NOT_OK(StructFieldFunctor::CheckIndex(index, *type));
+ type = type->field(index)->type().get();
+ }
+ }
Review Comment:
The fact that you use FindAll here, that would catch duplicate field names?
Maybe we don't necessarily have to support that? (also the way that you loop
over those might not be correct?)
##########
cpp/src/arrow/compute/kernels/scalar_nested.cc:
##########
@@ -271,11 +318,30 @@ struct StructFieldFunctor {
Result<TypeHolder> ResolveStructFieldType(KernelContext* ctx,
const std::vector<TypeHolder>&
types) {
- const auto& options = OptionsWrapper<StructFieldOptions>::Get(ctx);
+ const auto& field_ref =
OptionsWrapper<StructFieldOptions>::Get(ctx).field_ref;
const DataType* type = types.front().type;
- for (const auto& index : options.indices) {
- RETURN_NOT_OK(StructFieldFunctor::CheckIndex(index, *type));
- type = type->field(index)->type().get();
+
+ if (field_ref.IsName()) {
+ for (const auto& path : field_ref.FindAll(*type)) {
+ for (const auto& index : path.indices()) {
+ RETURN_NOT_OK(StructFieldFunctor::CheckIndex(index, *type));
+ type = type->field(index)->type().get();
+ }
+ }
+ } else {
+ DCHECK(field_ref.IsFieldPath() || field_ref.IsNested());
+
+ FieldPath field_path;
+ if (field_ref.IsNested()) {
+ ARROW_ASSIGN_OR_RAISE(field_path, field_ref.FindOne(*type));
+ } else {
+ field_path = *field_ref.field_path();
+ }
Review Comment:
I am wondering if it is actually needed to do this if/else. Also a FieldRef
backed by a field path has a FindOne method, and in that case it will basically
return the same field path, while still checking that it is a valid field path
(not out of bounds)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]