westonpace commented on code in PR #34050:
URL: https://github.com/apache/arrow/pull/34050#discussion_r1097749621
##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -320,6 +320,30 @@ Result<compute::Expression> FromProto(const
substrait::Expression& expr,
return function_converter(substrait_call);
}
+ case substrait::Expression::kCast: {
+ const auto& cast_exp = expr.cast();
+ ARROW_ASSIGN_OR_RAISE(auto input,
+ FromProto(cast_exp.input(), ext_set,
conversion_options));
+
+ // need to convert from the Substrait type spec to the relevant Arrow
thing
+ // do we need a switch/case or can we just get code similar to the below
FromProto?
+ // the type should come from cast_expr.type()
+
+ // this will be a Result<std::pair<std::shared_ptr<DataType>
+ ARROW_ASSIGN_OR_RAISE(auto type,
+ FromProto(cast_exp.type(), ext_set,
conversion_options));
+
+ if(cast_exp.failure_behavior() ==
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_THROW_EXCEPTION){
+ return compute::call("cast", std::move(input),
compute::CastOptions::Safe(type));
+ } else if(cast_exp.failure_behavior() ==
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_RETURN_NULL){
+
Review Comment:
I don't think Acero has any capability to do this today (might be a nice
addition) so the best we can do is return a NotImplemented error.
##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -320,6 +320,30 @@ Result<compute::Expression> FromProto(const
substrait::Expression& expr,
return function_converter(substrait_call);
}
+ case substrait::Expression::kCast: {
+ const auto& cast_exp = expr.cast();
+ ARROW_ASSIGN_OR_RAISE(auto input,
+ FromProto(cast_exp.input(), ext_set,
conversion_options));
+
+ // need to convert from the Substrait type spec to the relevant Arrow
thing
+ // do we need a switch/case or can we just get code similar to the below
FromProto?
+ // the type should come from cast_expr.type()
+
+ // this will be a Result<std::pair<std::shared_ptr<DataType>
+ ARROW_ASSIGN_OR_RAISE(auto type,
+ FromProto(cast_exp.type(), ext_set,
conversion_options));
+
+ if(cast_exp.failure_behavior() ==
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_THROW_EXCEPTION){
+ return compute::call("cast", std::move(input),
compute::CastOptions::Safe(type));
+ } else if(cast_exp.failure_behavior() ==
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_RETURN_NULL){
+
+ // e.g. if unspecified
+ } else {
+ return compute::call("cast", std::move(input),
compute::CastOptions::Unsafe(type));
Review Comment:
I notice you switch between safe and unsafe here. I'm not sure safe/unsafe
(i.e. is this value invalid?) is the same as throw/null (i.e. this value is
invalid, what should we do?). For example, an unsafe cast from int16 to int8
would overflow 500 to -12 instead of assigning null. It seems that Substrait
has no concept of safe/unsafe (might be a good addition to the spec).
##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -320,6 +320,30 @@ Result<compute::Expression> FromProto(const
substrait::Expression& expr,
return function_converter(substrait_call);
}
+ case substrait::Expression::kCast: {
+ const auto& cast_exp = expr.cast();
+ ARROW_ASSIGN_OR_RAISE(auto input,
+ FromProto(cast_exp.input(), ext_set,
conversion_options));
+
+ // need to convert from the Substrait type spec to the relevant Arrow
thing
+ // do we need a switch/case or can we just get code similar to the below
FromProto?
Review Comment:
Using `FromProto` as you are doing is the way to go. The second half of the
FromProto return is nullability. In Arrow nullability is a field concept and
not a type concept. In Substrait it is a part of the type. So we have to
return two pieces when we convert to Arrow.
Acero has no concept of nullability so for the purposes of consumption we
generally ignore it (this is a [documented
caveat](https://arrow.apache.org/docs/dev/cpp/streaming_execution.html#types)).
This does raise a bit of a pickle when the user requests EXACT_ROUNDTRIP as
we may lose information like this. For example, one thing we will have to
consider when implementing the reverse operation is whether the target type
should be marked nullable or not.
A limited but safe approach could be to just pick one (probably nullable)
and, if we are in EXACT_ROUNDTRIP, then reject any incoming plan that tries to
cast to a non-nullable field.
A more extensible approach might be to only reject a plan if it changes the
nullability of the input type. Then, when converting to proto, we would have
to make sure the output nullability of the cast matches the input nullability.
I don't yet have a good enough understanding of what the ToProto path will look
like to know if this is feasible or not.
##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -320,6 +320,30 @@ Result<compute::Expression> FromProto(const
substrait::Expression& expr,
return function_converter(substrait_call);
}
+ case substrait::Expression::kCast: {
+ const auto& cast_exp = expr.cast();
+ ARROW_ASSIGN_OR_RAISE(auto input,
+ FromProto(cast_exp.input(), ext_set,
conversion_options));
+
+ // need to convert from the Substrait type spec to the relevant Arrow
thing
+ // do we need a switch/case or can we just get code similar to the below
FromProto?
+ // the type should come from cast_expr.type()
+
+ // this will be a Result<std::pair<std::shared_ptr<DataType>
+ ARROW_ASSIGN_OR_RAISE(auto type,
+ FromProto(cast_exp.type(), ext_set,
conversion_options));
+
+ if(cast_exp.failure_behavior() ==
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_THROW_EXCEPTION){
+ return compute::call("cast", std::move(input),
compute::CastOptions::Safe(type));
+ } else if(cast_exp.failure_behavior() ==
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_RETURN_NULL){
+
+ // e.g. if unspecified
+ } else {
+ return compute::call("cast", std::move(input),
compute::CastOptions::Unsafe(type));
Review Comment:
Without a comment `// Implies XYZ` in the protobuf I am unclear if there is
a declared default. As a result, I would think such a plan should be
considered invalid and rejected. Otherwise we risk producer and consumer
thinking different things are the default. I might raise this on the Substrait
ML.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]