Jefffrey opened a new issue, #17692: URL: https://github.com/apache/datafusion/issues/17692
### Describe the bug When using `array_to_string` on a nested list, it doesn't properly check the element is null when doing conversion. ### To Reproduce Example SLT case: https://github.com/apache/datafusion/blob/d55fb6d9f957d687a5a8334c7f4957cf76b65f96/datafusion/sqllogictest/test_files/array.slt#L4833-L4836 Here the input list looks like `[[NULL, 'a'], NULL]` but the output string is `-,a,-,-`, where `-` is the replacement for nulls; we can see the output string has three nulls whereas our list only had two. Another example are these Rust test cases: ```rust // In datafusion/functions-nested/src/string.rs #[cfg(test)] mod tests { use super::array_to_string_inner; use arrow::array::{ Array, ArrayRef, AsArray, FixedSizeListArray, Int64Array, ListArray, StringArray, }; use arrow::buffer::{NullBuffer, OffsetBuffer}; use arrow::datatypes::{DataType, Field}; use datafusion_common::Result; use std::sync::Arc; #[test] fn test_list() -> Result<()> { // Offsets + Values looks like this: [[1, NULL], [3, 4], [5, 6]] // But with NullBuffer it's like this: [[1, NULL], NULL, [5, 6]] let list_inner = Arc::new(ListArray::new( Field::new_list_field(DataType::Int64, true).into(), OffsetBuffer::new(vec![0, 2, 4, 6].into()), Arc::new(Int64Array::from(vec![ Some(1), None, Some(3), Some(4), Some(5), Some(6), ])), Some(NullBuffer::from(vec![true, false, true])), )); let list_outer = Arc::new(ListArray::new( Field::new_list_field(list_inner.data_type().clone(), true).into(), OffsetBuffer::new(vec![0, 3].into()), Arc::clone(&list_inner) as ArrayRef, None, )); let delimiters = Arc::new(StringArray::from(vec!["-"])); let null_strings = Arc::new(StringArray::from(vec!["[NULL]"])); let output = array_to_string_inner(&[list_outer, delimiters, null_strings])?; let value = output.as_string::<i32>().value(0); assert_eq!(value, "1-[NULL]-3-4-5-6"); Ok(()) } #[test] fn test_fsl() -> Result<()> { // Size + Values looks like this: [[1, NULL], [3, 4], [5, 6]] // But with NullBuffer it's like this: [[1, NULL], NULL, [5, 6]] let list_inner = Arc::new(FixedSizeListArray::new( Field::new_list_field(DataType::Int64, true).into(), 2, Arc::new(Int64Array::from(vec![ Some(1), None, Some(3), Some(4), Some(5), Some(6), ])), Some(NullBuffer::from(vec![true, false, true])), )); let list_outer = Arc::new(ListArray::new( Field::new_list_field(list_inner.data_type().clone(), true).into(), OffsetBuffer::new(vec![0, 3].into()), Arc::clone(&list_inner) as ArrayRef, None, )); let delimiters = Arc::new(StringArray::from(vec!["-"])); let null_strings = Arc::new(StringArray::from(vec!["[NULL]"])); let output = array_to_string_inner(&[list_outer, delimiters, null_strings])?; let value = output.as_string::<i32>().value(0); assert_eq!(value, "1-[NULL]-3-4-5-6"); Ok(()) } } ``` We can see both output the `3-4` elements, even though in their original lists these are not valid since the List slot itself is null. ### Expected behavior Null lists should not be accessed as if they are a valid list. Root cause seems here: https://github.com/apache/datafusion/blob/d55fb6d9f957d687a5a8334c7f4957cf76b65f96/datafusion/functions-nested/src/string.rs#L370-L379 Specifically line 374, accessing `list_array.value(i)` without a null check. The same goes for the other two arms below it. ### Additional context I'm concerned this similar issue may be in other functions 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
