alamb opened a new issue, #4788:
URL: https://github.com/apache/arrow-rs/issues/4788

   **Describe the bug**
   The cmp kernels do not ignore null key values of DictionaryArrays as they 
should
   
   **To Reproduce**
   Run this program:
   
   ```rust
   use std::sync::Arc;
   
   use arrow::array::{Int32Array, StringArray, BooleanBufferBuilder};
   use arrow::buffer::{ScalarBuffer, NullBuffer};
   use arrow::{array::DictionaryArray};
   use arrow::util::pretty::pretty_format_columns;
   
   fn main() {
       // Logically like this:
       // keys: PrimitiveArray<Int32>
       //                                                                       
                                   [
       //   null,
       //   1,
       //   0,
       // ]
       // values: StringArray
       // [
       //   "us-west",
       //   "us-east",
       // ]}
       let values = StringArray::from(vec![Some("us-west"), Some("us-east")]);
   
       let mut nulls = BooleanBufferBuilder::new(3);
       nulls.append(false); // null
       nulls.append(true);
       nulls.append(true);
       let nulls: NullBuffer = nulls.finish().into();
   
       // key values
       //
       // since element 0 is NULL, the index value of 100 should be
       // ignored
       let key_values = ScalarBuffer::from(vec![100i32, 1i32, 0i32]);
       let keys = Int32Array::new(key_values, Some(nulls));
   
       let col = DictionaryArray::try_new(keys, Arc::new(values)).unwrap();
   
       println!("Input col: {col:?}");
   
       let comparison  = arrow::compute::kernels::cmp::neq(
           &col.slice(0, col.len() - 1),
           &col.slice(1, col.len() - 1),
       )
           .expect("cmp");
   
       println!("comparison: {}", pretty_format_columns("neq", 
&[Arc::new(comparison) as _]).unwrap());
   }
   
   ```
   
   Results in 
   ```
   Input col: DictionaryArray {keys: PrimitiveArray<Int32>
   [
     null,
     1,
     0,
   ] values: StringArray
   [
     "us-west",
     "us-east",
   ]}
   
   thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', 
/Users/alamb/Software/arrow-rs/arrow-array/src/array/byte_array.rs:294:38
   stack backtrace:
      0: rust_begin_unwind
                at 
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/std/src/panicking.rs:593:5
      1: core::panicking::panic_fmt
                at 
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:67:14
      2: core::panicking::panic
                at 
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/panicking.rs:117:5
      3: core::option::Option<T>::unwrap
                at 
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/option.rs:935:21
      4: arrow_array::array::byte_array::GenericByteArray<T>::value_unchecked
                at 
/Users/alamb/Software/arrow-rs/arrow-array/src/array/byte_array.rs:294:13
      5: <&arrow_array::array::byte_array::GenericByteArray<T> as 
arrow_ord::cmp::ArrayOrd>::value_unchecked
                at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:522:9
      6: arrow_ord::cmp::apply_op_vectored::{{closure}}
                at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:450:12
      7: arrow_ord::cmp::collect_bool
                at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:388:24
      8: arrow_ord::cmp::apply_op_vectored
                at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:447:5
      9: arrow_ord::cmp::apply
                at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:326:17
     10: arrow_ord::cmp::compare_op::{{closure}}
                at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:219:29
     11: arrow_ord::cmp::compare_op
                at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:286:44
     12: arrow_ord::cmp::neq
                at /Users/alamb/Software/arrow-rs/arrow-ord/src/cmp.rs:86:5
     13: rust_arrow_playground::main
                at ./src/main.rs:40:23
     14: core::ops::function::FnOnce::call_once
                at 
/rustc/5680fa18feaa87f3ff04063800aec256c3d4b4be/library/core/src/ops/function.rs:250:5
   note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose 
backtrace.
   
   ```
   **Expected behavior**
   The code should not panic 
   
   Specifically, if you change the key values from
   ```rust
       let key_values = ScalarBuffer::from(vec![100i32, 1i32, 0i32]);
   ```
   
   To 
   
   ```rust
       let key_values = ScalarBuffer::from(vec![0i32, 1i32, 0i32]);
   ```
   
   Then the reproducer completes without error:
   
   ```
   comparison: +------+
   | neq  |
   +------+
   |      |
   | true |
   +------+
   ```
   
   **Additional context**
   
   Found while updating https://github.com/influxdata/influxdb_iox/pull/8577
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to