jorisvandenbossche commented on issue #37476: URL: https://github.com/apache/arrow/issues/37476#issuecomment-1700486940
Thanks for the report. I think the problem is that in the python->arrow conversion (`python_to_arrow.cc`), it is using a DictionaryBuilder under the hood, which is created with: https://github.com/apache/arrow/blob/9b6be29f431705ce1f85cc218c66d4d03698f06b/cpp/src/arrow/builder.cc#L312-L320 This is passing `exact_index_type = False`, and that essentially means that it will use an adaptive int builder (that starts with the bitwidth size you specified, but can still grow eg from int32 to int64 if needed). Maybe one way to fix the signed vs unsigned change is to let it use a AdaptiveIntBuilder vs AdaptiveUIntBuilder, depending on the signedness of the original index type. That would preserve the signedness, but keep the ability to let the bitwidth grow if necessary to convert the data to a dictionary type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
