jorisvandenbossche opened a new issue, #38034:
URL: https://github.com/apache/arrow/issues/38034
Small example with a dictionary encoded column, and checking the dtype:
```
In [59]: table = pa.table({'a': pa.array(["a", "b"]).dictionary_encode()})
In [60]: table
Out[60]:
pyarrow.Table
a: dictionary<values=string, indices=int32, ordered=0>
----
a: [ -- dictionary:
["a","b"] -- indices:
[0,1]]
In [61]: obj = table.__dataframe__()
In [62]: obj.get_column_by_name('a').dtype
Out[62]: (<DtypeKind.CATEGORICAL: 23>, 32, 'L', '=')
```
It correctly says that it is of kind "categorical", but then the bitwidth is
32 (because of int32 indices, so maybe correct?) and format string is "L"
(uint64, so definitly wrong?)
cc @AlenkaF
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]