Jefffrey commented on code in PR #18662:
URL: https://github.com/apache/datafusion/pull/18662#discussion_r2525927446


##########
datafusion/spark/src/function/hash/crc32.rs:
##########
@@ -124,11 +113,12 @@ fn spark_crc32(args: &[ArrayRef]) -> Result<ArrayRef> {
             let input = as_binary_view_array(input)?;
             Ok(spark_crc32_impl(input.iter()))
         }
-        _ => {
-            exec_err!(
-                "Spark `crc32` function: argument must be binary or large 
binary, got {:?}",
-                input.data_type()
-            )
+        DataType::FixedSizeBinary(_) => {
+            let input = as_fixed_size_binary_array(input)?;
+            Ok(spark_crc32_impl(input.iter()))
+        }
+        dt => {
+            internal_err!("Unsupported data type for crc32: {dt}")

Review Comment:
   This is an interesting case, it actually surfaced a bug in arrow-rs.
   
   So on main this would fail as such:
   
   ```
   1. query failed: DataFusion error: Error during planning: Execution error: 
Function 'crc32' user-defined coercion failed with "Execution error: `crc32` 
function does not support type Dictionary(Int32, Utf8)" No function matches the 
given name and argument types 'crc32(Dictionary(Int32, Utf8))'. You might need 
to add explicit type casts.
           Candidate functions:
           crc32(UserDefined)
   [SQL] select crc32(arrow_cast(null, 'Dictionary(Int32, Utf8)'))
   at 
/Users/jeffrey/Code/datafusion/datafusion/sqllogictest/test_files/spark/hash/crc32.slt:78
   ```
   
   On this PR it instead fails as such:
   
   ```
   1. query failed: DataFusion error: Optimizer rule 'simplify_expressions' 
failed
   caused by
   Arrow error: Compute error: Internal Error: Cannot cast BinaryView to 
BinaryArray of expected type
   [SQL] select crc32(arrow_cast(null, 'Dictionary(Int32, Utf8)'))
   at 
/Users/jeffrey/Code/datafusion/datafusion/sqllogictest/test_files/spark/hash/crc32.slt:84
   ```
   
   The error originates from here: 
https://github.com/apache/arrow-rs/blob/2bc269c3eec23f6794fcd793b641ea4c08325d54/arrow-cast/src/cast/dictionary.rs#L107-L125
   
   So our type coercion logic tries to cast the dictionary to a binary view 
(which I believe is valid), but arrow-rs has a bug which prevents the cast 
happening. I'll raise an issue on arrow-rs and will add this test case in this 
PR so we can track when the fix comes in to DataFusion.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to