albertlockett opened a new issue, #9321:
URL: https://github.com/apache/arrow-rs/issues/9321

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   <!--
   A clear and concise description of what the problem is. Ex. I'm always 
frustrated when [...] 
   (This section helps Arrow developers understand the context and *why* for 
this feature, in addition to  the *what*)
   -->
   
   We have a `RecordBatch` in our application that is has this form where 
there's values that can take on many types. We have a `type` column that 
identifies which column to read the values from, and values column for each 
type. The values columns are dictionary encoded.
   ```
   type | str_val | int_val | ...
   -----|---------|---------| ...
   str  |   "a"   |   null  | ...
   str  |   "b"   |   null  | ...
   int  |   null  |   1     | ...
   ```
   
   I was writing some code to construct the values columns from typed segments, 
by creating either all-null segments, or segments containing values, and 
concatenating them together. Something like:
   ```rs
   let str_val_col = concat([
     &DictionaryArray::new(str_val_keys, values.clone())
     &DictionaryArray::new(UInt8Array::new_null(non_str_len), values.clone()),
     // ...
   ])
   ```
   In my profiling, I noticed that `DictionaryArray::new` was slower than I 
expected because it was validating all the keys.
   
   **Describe the solution you'd like**
   <!--
   A clear and concise description of what you want to happen.
   -->
   
   In the case where the dictionary keys are all null, I think we can maybe 
skip this validation here?
   
https://github.com/apache/arrow-rs/blob/main/arrow-array/src/array/dictionary_array.rs#L289-L314
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   I could use `new_unchecked`, but this has a few downsides:
   - some code bases have wariness about unsafe code
   - if using the `force_validate`, we still validate the keys
   
   ```rs
   concat([
     #[allow(unsafe_code)]
     unsafe { 
       &DictionaryArray::new_unchecked(UInt8Array::new_null(len), 
values.clone()),
      }
     &DictionaryArray::new(non_null_keys, values.clone())
     // ...
   ])
   ```
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to