kosiew opened a new issue, #7607:
URL: https://github.com/apache/arrow-rs/issues/7607
### **Describe the bug**
`DictionaryArray::is_null(i)` currently returns `false` for entries where
the dictionary index is valid, even if the value it points to in the dictionary
value array is `null`. This results in incorrect behavior when
dictionary-encoded arrays are used in higher-level operations like
`count(distinct ...)`, which rely on `is_null` to exclude nulls.
This violates expected null semantics — the dictionary key is not `null`,
but the resolved value is — which should still be treated as `null`.
---
### **To Reproduce**
```rust
use arrow::array::{ArrayRef, DictionaryArray, Int32Array, StringArray};
use std::sync::Arc;
fn main() {
let dict_values = StringArray::from(vec![None, Some("abc")]);
let dict_indices = Int32Array::from(vec![0, 0, 0, 0, 0]); // All indices
point to a null value
let dict = DictionaryArray::new(dict_indices, Arc::new(dict_values) as
ArrayRef);
for i in 0..dict.len() {
println!("is_null({}) = {}", i, dict.is_null(i));
}
}
### Additional context
First raised here - https://github.com/apache/datafusion/issues/16228
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]