albertlockett opened a new issue, #9321:
URL: https://github.com/apache/arrow-rs/issues/9321
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
<!--
A clear and concise description of what the problem is. Ex. I'm always
frustrated when [...]
(This section helps Arrow developers understand the context and *why* for
this feature, in addition to the *what*)
-->
We have a `RecordBatch` in our application that is has this form where
there's values that can take on many types. We have a `type` column that
identifies which column to read the values from, and values column for each
type. The values columns are dictionary encoded.
```
type | str_val | int_val | ...
-----|---------|---------| ...
str | "a" | null | ...
str | "b" | null | ...
int | null | 1 | ...
```
I was writing some code to construct the values columns from typed segments,
by creating either all-null segments, or segments containing values, and
concatenating them together. Something like:
```rs
let str_val_col = concat([
&DictionaryArray::new(str_val_keys, values.clone())
&DictionaryArray::new(UInt8Array::new_null(non_str_len), values.clone()),
// ...
])
```
In my profiling, I noticed that `DictionaryArray::new` was slower than I
expected because it was validating all the keys.
**Describe the solution you'd like**
<!--
A clear and concise description of what you want to happen.
-->
In the case where the dictionary keys are all null, I think we can maybe
skip this validation here?
https://github.com/apache/arrow-rs/blob/main/arrow-array/src/array/dictionary_array.rs#L289-L314
**Describe alternatives you've considered**
<!--
A clear and concise description of any alternative solutions or features
you've considered.
-->
I could use `new_unchecked`, but this has a few downsides:
- some code bases have wariness about unsafe code
- if using the `force_validate`, we still validate the keys
```rs
concat([
#[allow(unsafe_code)]
unsafe {
&DictionaryArray::new_unchecked(UInt8Array::new_null(len),
values.clone()),
}
&DictionaryArray::new(non_null_keys, values.clone())
// ...
])
```
**Additional context**
<!--
Add any other context or screenshots about the feature request here.
-->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]