Jefffrey opened a new issue, #17692:
URL: https://github.com/apache/datafusion/issues/17692

   ### Describe the bug
   
   When using `array_to_string` on a nested list, it doesn't properly check the 
element is null when doing conversion.
   
   ### To Reproduce
   
   Example SLT case:
   
   
https://github.com/apache/datafusion/blob/d55fb6d9f957d687a5a8334c7f4957cf76b65f96/datafusion/sqllogictest/test_files/array.slt#L4833-L4836
   
   Here the input list looks like `[[NULL, 'a'], NULL]` but the output string 
is `-,a,-,-`, where `-` is the replacement for nulls; we can see the output 
string has three nulls whereas our list only had two.
   
   Another example are these Rust test cases:
   
   ```rust
   // In datafusion/functions-nested/src/string.rs
   #[cfg(test)]
   mod tests {
       use super::array_to_string_inner;
       use arrow::array::{
           Array, ArrayRef, AsArray, FixedSizeListArray, Int64Array, ListArray, 
StringArray,
       };
       use arrow::buffer::{NullBuffer, OffsetBuffer};
       use arrow::datatypes::{DataType, Field};
       use datafusion_common::Result;
       use std::sync::Arc;
   
       #[test]
       fn test_list() -> Result<()> {
           // Offsets + Values looks like this: [[1, NULL], [3, 4], [5, 6]]
           // But with NullBuffer it's like this: [[1, NULL], NULL, [5, 6]]
           let list_inner = Arc::new(ListArray::new(
               Field::new_list_field(DataType::Int64, true).into(),
               OffsetBuffer::new(vec![0, 2, 4, 6].into()),
               Arc::new(Int64Array::from(vec![
                   Some(1),
                   None,
                   Some(3),
                   Some(4),
                   Some(5),
                   Some(6),
               ])),
               Some(NullBuffer::from(vec![true, false, true])),
           ));
   
           let list_outer = Arc::new(ListArray::new(
               Field::new_list_field(list_inner.data_type().clone(), 
true).into(),
               OffsetBuffer::new(vec![0, 3].into()),
               Arc::clone(&list_inner) as ArrayRef,
               None,
           ));
   
           let delimiters = Arc::new(StringArray::from(vec!["-"]));
           let null_strings = Arc::new(StringArray::from(vec!["[NULL]"]));
   
           let output = array_to_string_inner(&[list_outer, delimiters, 
null_strings])?;
           let value = output.as_string::<i32>().value(0);
           assert_eq!(value, "1-[NULL]-3-4-5-6");
   
           Ok(())
       }
   
       #[test]
       fn test_fsl() -> Result<()> {
           // Size + Values looks like this: [[1, NULL], [3, 4], [5, 6]]
           // But with NullBuffer it's like this: [[1, NULL], NULL, [5, 6]]
           let list_inner = Arc::new(FixedSizeListArray::new(
               Field::new_list_field(DataType::Int64, true).into(),
               2,
               Arc::new(Int64Array::from(vec![
                   Some(1),
                   None,
                   Some(3),
                   Some(4),
                   Some(5),
                   Some(6),
               ])),
               Some(NullBuffer::from(vec![true, false, true])),
           ));
   
           let list_outer = Arc::new(ListArray::new(
               Field::new_list_field(list_inner.data_type().clone(), 
true).into(),
               OffsetBuffer::new(vec![0, 3].into()),
               Arc::clone(&list_inner) as ArrayRef,
               None,
           ));
   
           let delimiters = Arc::new(StringArray::from(vec!["-"]));
           let null_strings = Arc::new(StringArray::from(vec!["[NULL]"]));
   
           let output = array_to_string_inner(&[list_outer, delimiters, 
null_strings])?;
           let value = output.as_string::<i32>().value(0);
           assert_eq!(value, "1-[NULL]-3-4-5-6");
   
           Ok(())
       }
   }
   ```
   
   We can see both output the `3-4` elements, even though in their original 
lists these are not valid since the List slot itself is null.
   
   ### Expected behavior
   
   Null lists should not be accessed as if they are a valid list.
   
   Root cause seems here:
   
   
https://github.com/apache/datafusion/blob/d55fb6d9f957d687a5a8334c7f4957cf76b65f96/datafusion/functions-nested/src/string.rs#L370-L379
   
   Specifically line 374, accessing `list_array.value(i)` without a null check. 
The same goes for the other two arms below it.
   
   ### Additional context
   
   I'm concerned this similar issue may be in other functions 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to