Jefffrey commented on code in PR #18518:
URL: https://github.com/apache/datafusion/pull/18518#discussion_r2501541689


##########
datafusion/spark/src/function/array/shuffle.rs:
##########
@@ -66,32 +68,108 @@ impl ScalarUDFImpl for SparkShuffle {
     }
 
     fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
+        if arg_types.is_empty() {
+            return plan_err!("shuffle expects at least 1 argument");
+        }

Review Comment:
   ```suggestion
   ```
   
   We don't need this check



##########
datafusion/spark/src/function/array/shuffle.rs:
##########
@@ -47,7 +49,7 @@ impl Default for SparkShuffle {
 impl SparkShuffle {
     pub fn new() -> Self {
         Self {
-            signature: Signature::arrays(1, None, Volatility::Volatile),
+            signature: Signature::user_defined(Volatility::Volatile),

Review Comment:
   We should avoid using user_defined and make use of 
`TypeSignature::ArraySignature`, for example:
   
   
https://github.com/apache/datafusion/blob/7591919be7e6582ed7f6a8d0b033f7a0a8ad60f7/datafusion/expr-common/src/signature.rs#L1247-L1262
   
   - Index can be used as seed since it is Int64 type
   - We can't use `Signature::array_and_index` directly since that would coerce 
FixedSizeLists to List arrays, which we don't want since we have a native 
implementation for FixedSizeLists
   - Ensure there is support for both array only, and array + seed



##########
datafusion/sqllogictest/test_files/spark/array/shuffle.slt:
##########
@@ -22,7 +22,7 @@ SELECT array_sort(shuffle([1, 2, 3, 4, 5, NULL])) = [NULL,1, 
2, 3, 4, 5];
 true
 
 query B
-SELECT shuffle([1, 2, 3, 4, 5, NULL]) != [1, 2, 3, 4, 5, NULL];
+SELECT shuffle([1, 2, 3, 4, 5, NULL], 1) != [1, 2, 3, 4, 5, NULL];
 ----
 true

Review Comment:
   We can just assert the output list instead of it being an inequality with 
its sorted version now, e.g.
   
   ```sql
   query ?
   SELECT shuffle([1, 2, 3, 4, 5, NULL], 1);
   ----
   [2, 5, NULL, 3, 4, 1]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to