westonpace commented on code in PR #34050:
URL: https://github.com/apache/arrow/pull/34050#discussion_r1097749621


##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -320,6 +320,30 @@ Result<compute::Expression> FromProto(const 
substrait::Expression& expr,
       return function_converter(substrait_call);
     }
 
+    case substrait::Expression::kCast: {
+      const auto& cast_exp = expr.cast();
+      ARROW_ASSIGN_OR_RAISE(auto input,
+                            FromProto(cast_exp.input(), ext_set, 
conversion_options));
+
+      // need to convert from the Substrait type spec to the relevant Arrow 
thing
+      // do we need a switch/case or can we just get code similar to the below 
FromProto?
+      // the type should come from cast_expr.type()
+      
+      // this will be a Result<std::pair<std::shared_ptr<DataType>
+      ARROW_ASSIGN_OR_RAISE(auto type,
+                            FromProto(cast_exp.type(), ext_set, 
conversion_options));
+
+      if(cast_exp.failure_behavior() == 
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_THROW_EXCEPTION){
+        return compute::call("cast", std::move(input), 
compute::CastOptions::Safe(type));
+      } else if(cast_exp.failure_behavior() == 
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_RETURN_NULL){
+

Review Comment:
   I don't think Acero has any capability to do this today (might be a nice 
addition) so the best we can do is return a NotImplemented error.



##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -320,6 +320,30 @@ Result<compute::Expression> FromProto(const 
substrait::Expression& expr,
       return function_converter(substrait_call);
     }
 
+    case substrait::Expression::kCast: {
+      const auto& cast_exp = expr.cast();
+      ARROW_ASSIGN_OR_RAISE(auto input,
+                            FromProto(cast_exp.input(), ext_set, 
conversion_options));
+
+      // need to convert from the Substrait type spec to the relevant Arrow 
thing
+      // do we need a switch/case or can we just get code similar to the below 
FromProto?
+      // the type should come from cast_expr.type()
+      
+      // this will be a Result<std::pair<std::shared_ptr<DataType>
+      ARROW_ASSIGN_OR_RAISE(auto type,
+                            FromProto(cast_exp.type(), ext_set, 
conversion_options));
+
+      if(cast_exp.failure_behavior() == 
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_THROW_EXCEPTION){
+        return compute::call("cast", std::move(input), 
compute::CastOptions::Safe(type));
+      } else if(cast_exp.failure_behavior() == 
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_RETURN_NULL){
+
+      // e.g. if unspecified  
+      } else {
+        return compute::call("cast", std::move(input), 
compute::CastOptions::Unsafe(type));

Review Comment:
   I notice you switch between safe and unsafe here.  I'm not sure safe/unsafe 
(i.e. is this value invalid?) is the same as throw/null (i.e. this value is 
invalid, what should we do?).  For example, an unsafe cast from int16 to int8 
would overflow 500 to -12 instead of assigning null.  It seems that Substrait 
has no concept of safe/unsafe (might be a good addition to the spec).



##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -320,6 +320,30 @@ Result<compute::Expression> FromProto(const 
substrait::Expression& expr,
       return function_converter(substrait_call);
     }
 
+    case substrait::Expression::kCast: {
+      const auto& cast_exp = expr.cast();
+      ARROW_ASSIGN_OR_RAISE(auto input,
+                            FromProto(cast_exp.input(), ext_set, 
conversion_options));
+
+      // need to convert from the Substrait type spec to the relevant Arrow 
thing
+      // do we need a switch/case or can we just get code similar to the below 
FromProto?

Review Comment:
   Using `FromProto` as you are doing is the way to go.  The second half of the 
FromProto return is nullability.  In Arrow nullability is a field concept and 
not a type concept.  In Substrait it is a part of the type.  So we have to 
return two pieces when we convert to Arrow.
   
   Acero has no concept of nullability so for the purposes of consumption we 
generally ignore it (this is a [documented 
caveat](https://arrow.apache.org/docs/dev/cpp/streaming_execution.html#types)).
   
   This does raise a bit of a pickle when the user requests EXACT_ROUNDTRIP as 
we may lose information like this.  For example, one thing we will have to 
consider when implementing the reverse operation is whether the target type 
should be marked nullable or not.
   
   A limited but safe approach could be to just pick one (probably nullable) 
and, if we are in EXACT_ROUNDTRIP, then reject any incoming plan that tries to 
cast to a non-nullable field.
   
   A more extensible approach might be to only reject a plan if it changes the 
nullability of the input type.  Then, when converting to proto, we would have 
to make sure the output nullability of the cast matches the input nullability.  
I don't yet have a good enough understanding of what the ToProto path will look 
like to know if this is feasible or not.



##########
cpp/src/arrow/engine/substrait/expression_internal.cc:
##########
@@ -320,6 +320,30 @@ Result<compute::Expression> FromProto(const 
substrait::Expression& expr,
       return function_converter(substrait_call);
     }
 
+    case substrait::Expression::kCast: {
+      const auto& cast_exp = expr.cast();
+      ARROW_ASSIGN_OR_RAISE(auto input,
+                            FromProto(cast_exp.input(), ext_set, 
conversion_options));
+
+      // need to convert from the Substrait type spec to the relevant Arrow 
thing
+      // do we need a switch/case or can we just get code similar to the below 
FromProto?
+      // the type should come from cast_expr.type()
+      
+      // this will be a Result<std::pair<std::shared_ptr<DataType>
+      ARROW_ASSIGN_OR_RAISE(auto type,
+                            FromProto(cast_exp.type(), ext_set, 
conversion_options));
+
+      if(cast_exp.failure_behavior() == 
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_THROW_EXCEPTION){
+        return compute::call("cast", std::move(input), 
compute::CastOptions::Safe(type));
+      } else if(cast_exp.failure_behavior() == 
substrait::Expression_Cast_FailureBehavior::Expression_Cast_FailureBehavior_FAILURE_BEHAVIOR_RETURN_NULL){
+
+      // e.g. if unspecified  
+      } else {
+        return compute::call("cast", std::move(input), 
compute::CastOptions::Unsafe(type));

Review Comment:
   Without a comment `// Implies XYZ` in the protobuf I am unclear if there is 
a declared default.  As a result, I would think such a plan should be 
considered invalid and rejected.  Otherwise we risk producer and consumer 
thinking different things are the default.  I might raise this on the Substrait 
ML.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to