tustvold opened a new issue #1218: URL: https://github.com/apache/arrow-rs/issues/1218
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Currently when casting an array to DictionaryArray, the code will compute a new dictionary for the type. This dictionary will have unique values, but won't be sorted. However, in some cases uniqueness and/or sortedness may not be a priority, e.g. because a subsequent operation is going to filter out a large number of potential matches, and computing this dictionary is therefore wasted effort. **Describe the solution you'd like** Add two new CastOptions: * `sort_dictionary` - if the result is a dictionary array, the dictionary will be sorted * `pack_dictionary` - if the result is a dictionary array, the dictionary will be unique This will give the cast kernel the leeway to construct a DictionaryArray, by taking the provided array as the dictionary child data (values), and encoding `0..array.len()` in the keys array. This will of course need to fallback to computing a packed dictionary if the key size is too small to accommodate this. This will also provide an obvious way to implement (#506) as an array could be cast to itself with options to sort and/or pack the dictionary. This could be further combined with #1217 to avoid doing this computation if not necessary. **Additional Context** The concat kernel currently takes a similar approach of avoiding recomputing dictionaries -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
