jhorstmann commented on PR #4407:
URL: https://github.com/apache/arrow-rs/pull/4407#issuecomment-1589549258

   Agree that the performance benefit of specialized kernels is probably not 
worth the complexity and added code.
   
   > Currently calling a kernel with a DictionaryArray and a scalar returns a 
DictionaryArray, however, calling a kernel with two DictionaryArray returns a 
PrimitiveArray, the latter feels strange to me
   
   This kind of makes sense to me, for many operations involving scalars, the 
dictionary would still be unique afterwards, while an operation with two 
dictionaries would lead to combinatoric explosion and no longer is beneficial 
to dictionary encode the results. Operations like `array * 0` would of course 
lead to all duplicated values in the dictionary, so always returning a 
`PrimitiveArray` could be more consistent.
   
   In our engine we had a similar issue with string replace or concat 
operations, where we decided that such operations on two dictionary arrays 
would always result in a string array, but with dictionary array and literal 
string it would be beneficial to build a new dictionary.
   
   I did not review the code in detail, maybe this is already happening, but 
could the dyn kernels automatically downcast/materialize dictionary arrays so 
that dictionary arrays are still supported as inputs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to