felipecrv commented on issue #40128:
URL: https://github.com/apache/arrow/issues/40128#issuecomment-1954354911

   My take here is that dictionaries are for compression and might be 
aggressively further-compressed by some operations [1], but `cast` is special 
and *should preserve* the original dictionary because the goal of a cast is to 
change only the encoding of the data with as little as possible impact on its 
semantic properties.
   
   I would keep the fast path that and make only cast kernels the exception. I 
don't know exactly how without looking at the code.
   
   My reasoning: we don't want code accidentally relying on a guarantee that is 
stronger than "the dictionary contains the values that are referenced at least 
once". The path to full determinism here is probably being more aggressive in 
removing values, than preserving them.
   
   [1] many operations act on the `array[0..length)` values and might gather 
only the dictionary values that have references to them


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to