jhorstmann commented on PR #4407: URL: https://github.com/apache/arrow-rs/pull/4407#issuecomment-1589549258
Agree that the performance benefit of specialized kernels is probably not worth the complexity and added code. > Currently calling a kernel with a DictionaryArray and a scalar returns a DictionaryArray, however, calling a kernel with two DictionaryArray returns a PrimitiveArray, the latter feels strange to me This kind of makes sense to me, for many operations involving scalars, the dictionary would still be unique afterwards, while an operation with two dictionaries would lead to combinatoric explosion and no longer is beneficial to dictionary encode the results. Operations like `array * 0` would of course lead to all duplicated values in the dictionary, so always returning a `PrimitiveArray` could be more consistent. In our engine we had a similar issue with string replace or concat operations, where we decided that such operations on two dictionary arrays would always result in a string array, but with dictionary array and literal string it would be beneficial to build a new dictionary. I did not review the code in detail, maybe this is already happening, but could the dyn kernels automatically downcast/materialize dictionary arrays so that dictionary arrays are still supported as inputs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
