[
https://issues.apache.org/jira/browse/ARROW-14564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Will Jones resolved ARROW-14564.
--------------------------------
Resolution: Not A Problem
> [python] uint32 incorrectly saves to Parquet as int64
> -----------------------------------------------------
>
> Key: ARROW-14564
> URL: https://issues.apache.org/jira/browse/ARROW-14564
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 6.0.0
> Environment: Ubuntu 20.10, Python 3.8.10
> Reporter: Bruce Allen
> Priority: Major
> Attachments: test_u32.py
>
>
> Function pyarrow.parquet.write_table() incorrectly saves data of type
> unsigned int32 as signed int64. Code test_u32.py showing failure is attached.
> Output from running test_u32.py indicating faulty retyping:
> pyarrow version: 6.0.0
> numpy data:
> [(1, 2) (3, 4)]
> [('my_u2', '<u2'), ('my_u4', '<u4')]
> result:
> my_u2 my_u4
> 0 1 2
> 1 3 4
> my_u2 uint16
> my_u4 int64
> dtype: object
>
> We can also observe that the incorrect int64 type is in the Parquet file by
> using the "parq" tool:
> $ parq _test_u32_pq --schema
> # Schema
> <pyarrow._parquet.ParquetSchema object at 0x7ff2e40b2a40>
> required group field_id=-1 schema {
> optional int32 field_id=-1 my_u2 (Int(bitWidth=16, isSigned=false));
> optional int64 field_id=-1 my_u4;
> }
--
This message was sent by Atlassian Jira
(v8.20.1#820001)