AlenkaF commented on issue #33059:
URL: https://github.com/apache/arrow/issues/33059#issuecomment-4388939904

   I will close this issue as it has been fixed by 
https://github.com/apache/arrow/pull/14106.
   
   As for the promoting of integer types for indices in case of the dictionary 
type, there is a comment connected to this in the docstrings:
   
   
https://github.com/apache/arrow/blob/23cd1ff8f4e33b3207875e3395d2d6b1aeb1edc2/python/pyarrow/array.pxi#L188-L193
   
   but I am not sure `uint` being promoted to `int` of same size fits here as 
this change seems to happen even if not necessary. I asked Copilot to help me 
dig through the code. I seems this is expected on the C++ side, see:
   
   
https://github.com/apache/arrow/blob/61c96ca0612ae46ef05becfeb5f987197180cb2e/cpp/src/arrow/array/builder_dict.h#L671-L674
   
   `DictionaryBuilder` uses `AdaptiveIntBuilder` to create indices and it does 
not utilize `AdaptiveUIntBuilder`. Looking at the format docs, I also found:
   
   > Since unsigned integers can be more difficult to work with in some cases 
(e.g. in the JVM), we recommend preferring signed integers over unsigned 
integers for representing dictionary indices.
   
   here: 
https://arrow.apache.org/docs/format/Columnar.html#dictionary-encoded-layout
   
   A separate issue can be opened in case this design decision needs to be 
discussed further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to