[jira] [Commented] (ARROW-9215) pyarrow parquet writer converts uint32 columns to int64

Uwe Korn (Jira) Wed, 03 Feb 2021 03:01:05 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277907#comment-17277907
 ]


Uwe Korn commented on ARROW-9215:
---------------------------------

For uint64, we have no better option than int64 and have to live with some kind 
of overflows. For uint32 where some values don't fit into int32, we can 
definitely fit all possible values inside the int64 range, thus we can avoid 
overflows by upcasting to int64.

> pyarrow parquet writer converts uint32 columns to int64
> -------------------------------------------------------
>
>                 Key: ARROW-9215
>                 URL: https://issues.apache.org/jira/browse/ARROW-9215
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Devavret Makkar
>            Assignee: Uwe Korn
>            Priority: Major
>
> pyarrow parquet writer changes uint32 columns to int64. This change is not 
> made for other types and uint8, uint16, and uint64 columns retain their type.
> {code:python}
> In [1]: import pandas as pd
> In [2]: import pyarrow as pa
> In [3]: import pyarrow.parquet as pq
> In [5]: df = pd.DataFrame({'a':pd.Series([1,2,3], dtype='uint32')})
> In [6]: padf = pa.Table.from_pandas(df)
> In [7]: padf
> Out[7]: 
> pyarrow.Table
> a: uint32
> In [8]: pq.write_table(padf, 'pa.parquet')
> In [9]: pq.read_table('pa.parquet')
> Out[9]: 
> pyarrow.Table
> a: int64
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-9215) pyarrow parquet writer converts uint32 columns to int64

Reply via email to