rohitrastogi commented on code in PR #399:
URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1596117835


##########
core/src/execution/datafusion/expressions/cast.rs:
##########
@@ -232,6 +232,240 @@ macro_rules! cast_int_to_int_macro {
     }};
 }
 
+// When Spark casts to Byte/Short Types, it does not cast directly to 
Byte/Short.
+// It casts to Int first and then to Byte/Short. Because of potential 
overflows in the Int cast,
+// this can cause unexpected Short/Byte cast results. Replicate this behavior.
+macro_rules! cast_float_to_int16_down {
+    (
+        $array:expr,
+        $eval_mode:expr,
+        $src_array_type:ty,
+        $dest_array_type:ty,
+        $rust_src_type:ty,
+        $rust_dest_type:ty,
+        $src_type_str:expr,
+        $dest_type_str:expr,
+        $format_str:expr
+    ) => {{
+        let cast_array = $array
+            .as_any()
+            .downcast_ref::<$src_array_type>()
+            .expect(concat!("Expected a ", stringify!($src_array_type)));
+
+        let output_array = match $eval_mode {
+            EvalMode::Ansi => cast_array
+                .iter()
+                .map(|value| match value {
+                    Some(value) => {
+                        let is_overflow = value.is_nan() || value.abs() as i32 
== std::i32::MAX;

Review Comment:
   @andygrove This condition is actually incorrect.
   Should be something like: 
   ```
   let is_overflow = value.is_nan() || (value as f64).floor()  > (std::i32::MAX 
as f64)  || (value as f64).ceil() < (std::i32::MIN as f64);
   ```
   
   This is what Scala does in FloatExactNumeric.
   
   Working on a fix with some improved tests. It looks like there are some 
tedious edge cases on how Java/Scala format the error strings depending on how 
large the float is. 
   
   Rust and Scala format the same float as decimals with different precisions 
when printing, which makes it challenging to get the same error output as Spark 
in ANSI mode. Not sure how to address that - we may need to relax the exact 
string match criteria for float -> int conversions and warn users that though 
the error checking logic from vanilla Spark/Comet are the same, the error 
messages are different.
   
   For example:
   For INT_MAX, Rust prints 2.1474836E9, whereas Java prints 2.14748365E9. Both 
printouts correspond to the same float 2147483648.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to