alamb commented on code in PR #7862:
URL: https://github.com/apache/arrow-datafusion/pull/7862#discussion_r1369268008


##########
datafusion/common/src/scalar.rs:
##########
@@ -2130,6 +2129,8 @@ impl ScalarValue {
 
     /// Retrieve ScalarValue for each row in `array`
     ///
+    /// Convert ListArray to `Vec<Vec<ScalarValue>>`, first `Vec` is for rows, 
second `Vec` is for elements in the list

Review Comment:
   👍  



##########
datafusion/physical-expr/src/aggregate/array_agg_distinct.rs:
##########
@@ -308,146 +304,4 @@ mod tests {
 
         check_merge_distinct_array_agg(col1, col2, expected, DataType::Int32)
     }
-
-    #[test]
-    fn distinct_array_agg_nested() -> Result<()> {

Review Comment:
   I don't understand the rationale for deleting these tests -- is the 
functionality covered elsewhere? If we deleted this code, won't we have less 
test coverage?



##########
datafusion/common/src/scalar.rs:
##########
@@ -2156,30 +2157,58 @@ impl ScalarValue {
     ///
     /// assert_eq!(scalar_vec, expected);
     /// ```
-    pub fn convert_array_to_scalar_vec(array: &dyn Array) -> 
Result<Vec<Vec<Self>>> {
-        let mut scalars = Vec::with_capacity(array.len());
+    pub fn convert_list_array_to_scalar_vec(array: &dyn Array) -> 
Result<Vec<Vec<Self>>> {
+        let mut scalars_vec = Vec::with_capacity(array.len());
+
+        if as_list_array(array).is_ok() {
+            let list_arr = as_list_array(array)?;
+            for index in 0..list_arr.len() {
+                let scalars = match list_arr.is_null(index) {
+                    true => Vec::new(),
+                    false => {
+                        let nested_array = list_arr.value(index);
+                        
ScalarValue::convert_list_array_to_scalar_vec(&nested_array)?
+                            .into_iter()
+                            .flatten()
+                            .collect()
+                    }
+                };
+                scalars_vec.push(scalars);
+            }
+        } else {
+            let scalars = 
ScalarValue::convert_non_list_array_to_scalars(array)?;

Review Comment:
   This seems like it would result in some very confusing behavior -- a single 
row vec -- If I pass a non `ListArray` to a function called 
`convert_list_array_to_scalar_vec` I would prefer to get an error back so I can 
track down the bug in my code rather than get back something unexpected and 
have to know to check for that



##########
datafusion/common/src/scalar.rs:
##########
@@ -2156,30 +2157,58 @@ impl ScalarValue {
     ///
     /// assert_eq!(scalar_vec, expected);
     /// ```
-    pub fn convert_array_to_scalar_vec(array: &dyn Array) -> 
Result<Vec<Vec<Self>>> {
-        let mut scalars = Vec::with_capacity(array.len());
+    pub fn convert_list_array_to_scalar_vec(array: &dyn Array) -> 
Result<Vec<Vec<Self>>> {
+        let mut scalars_vec = Vec::with_capacity(array.len());
+
+        if as_list_array(array).is_ok() {

Review Comment:
   In general, I think it would be better to avoid ignoring errors (which is 
what `is_ok()` does as each error requires at least one string allocation. This 
particular usage isn't likely performance critical but the pattern is not great 
and we have had trouble with it in the past.
   
   
   
   It makes sense to me that we would need two different functions for Arrays 
to ScalarValue (as the return type of the ListArray needs to be different). 
   
   I think it would be less surprising (and thus less error prone) if this 
function simply returned an error if the input was not a list array



##########
datafusion/common/src/scalar.rs:
##########
@@ -2130,6 +2129,8 @@ impl ScalarValue {
 
     /// Retrieve ScalarValue for each row in `array`
     ///
+    /// Convert ListArray to `Vec<Vec<ScalarValue>>`, first `Vec` is for rows, 
second `Vec` is for elements in the list

Review Comment:
   👍  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to