jorisvandenbossche opened a new issue, #38034:
URL: https://github.com/apache/arrow/issues/38034

   Small example with a dictionary encoded column, and checking the dtype:
   
   ```
   In [59]: table = pa.table({'a': pa.array(["a", "b"]).dictionary_encode()})
   
   In [60]: table
   Out[60]: 
   pyarrow.Table
   a: dictionary<values=string, indices=int32, ordered=0>
   ----
   a: [  -- dictionary:
   ["a","b"]  -- indices:
   [0,1]]
   
   In [61]: obj = table.__dataframe__()
   
   In [62]: obj.get_column_by_name('a').dtype
   Out[62]: (<DtypeKind.CATEGORICAL: 23>, 32, 'L', '=')
   ```
   
   It correctly says that it is of kind "categorical", but then the bitwidth is 
32 (because of int32 indices, so maybe correct?) and format string is "L" 
(uint64, so definitly wrong?)
   
   cc @AlenkaF 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to