Jefffrey commented on code in PR #20116:
URL: https://github.com/apache/datafusion/pull/20116#discussion_r2761901006


##########
datafusion/spark/src/function/hash/sha2.rs:
##########
@@ -112,10 +203,31 @@ fn sha2_binary_impl<'a, BinaryArrType>(
 ) -> ArrayRef
 where
     BinaryArrType: BinaryArrayType<'a>,
+{
+    sha2_binary_bitlen_iter(values, bit_lengths.iter())
+}
+
+fn sha2_binary_scalar_bitlen<'a, BinaryArrType>(
+    values: &BinaryArrType,
+    bit_length: i32,
+) -> ArrayRef
+where
+    BinaryArrType: BinaryArrayType<'a>,
+{
+    sha2_binary_bitlen_iter(values, std::iter::repeat(Some(bit_length)))

Review Comment:
   I was thinking along the lines of removing the match logic on the hot loop 
below, if we know the bit length for all values; I think it'll result in more 
verbose code but could be worth performance. Can look into this in a followup



##########
datafusion/spark/src/function/hash/sha2.rs:
##########
@@ -87,7 +88,97 @@ impl ScalarUDFImpl for SparkSha2 {
     }
 
     fn invoke_with_args(&self, args: ScalarFunctionArgs) -> 
Result<ColumnarValue> {
-        make_scalar_function(sha2_impl, vec![])(&args.args)
+        let [values, bit_lengths] = take_function_args(self.name(), 
args.args.iter())?;
+
+        match (values, bit_lengths) {
+            (
+                ColumnarValue::Scalar(value_scalar),
+                ColumnarValue::Scalar(ScalarValue::Int32(Some(bit_length))),
+            ) => {
+                if value_scalar.is_null() {

Review Comment:
   We should pull all null checks into a single branch at the top, e.g.
   
   ```rust
   match (values, bit_lengths) {
       (ColumnarValue::Scalar(s), _) | (_, ColumnarValue::Scalar(s)) if 
s.is_null() => { // return scalar null }
   ```
   
   This means we'd only need 4 arms:
   
   - One arm checking if either is null
   - One arm for scalar + scalar
   - One arm for array + scalar
   - Catch all (array + array, scalar + array)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to