Jefffrey commented on code in PR #18662:
URL: https://github.com/apache/datafusion/pull/18662#discussion_r2525927446
##########
datafusion/spark/src/function/hash/crc32.rs:
##########
@@ -124,11 +113,12 @@ fn spark_crc32(args: &[ArrayRef]) -> Result<ArrayRef> {
let input = as_binary_view_array(input)?;
Ok(spark_crc32_impl(input.iter()))
}
- _ => {
- exec_err!(
- "Spark `crc32` function: argument must be binary or large
binary, got {:?}",
- input.data_type()
- )
+ DataType::FixedSizeBinary(_) => {
+ let input = as_fixed_size_binary_array(input)?;
+ Ok(spark_crc32_impl(input.iter()))
+ }
+ dt => {
+ internal_err!("Unsupported data type for crc32: {dt}")
Review Comment:
This is an interesting case, it actually surfaced a bug in arrow-rs.
So on main this would fail as such:
```
1. query failed: DataFusion error: Error during planning: Execution error:
Function 'crc32' user-defined coercion failed with "Execution error: `crc32`
function does not support type Dictionary(Int32, Utf8)" No function matches the
given name and argument types 'crc32(Dictionary(Int32, Utf8))'. You might need
to add explicit type casts.
Candidate functions:
crc32(UserDefined)
[SQL] select crc32(arrow_cast(null, 'Dictionary(Int32, Utf8)'))
at
/Users/jeffrey/Code/datafusion/datafusion/sqllogictest/test_files/spark/hash/crc32.slt:78
```
On this PR it instead fails as such:
```
1. query failed: DataFusion error: Optimizer rule 'simplify_expressions'
failed
caused by
Arrow error: Compute error: Internal Error: Cannot cast BinaryView to
BinaryArray of expected type
[SQL] select crc32(arrow_cast(null, 'Dictionary(Int32, Utf8)'))
at
/Users/jeffrey/Code/datafusion/datafusion/sqllogictest/test_files/spark/hash/crc32.slt:84
```
The error originates from here:
https://github.com/apache/arrow-rs/blob/2bc269c3eec23f6794fcd793b641ea4c08325d54/arrow-cast/src/cast/dictionary.rs#L107-L125
So our type coercion logic tries to cast the dictionary to a binary view
(which I believe is valid), but arrow-rs has a bug which prevents the cast
happening. I'll raise an issue on arrow-rs and will add this test case in this
PR so we can track when the fix comes in to DataFusion.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]