[jira] [Resolved] (ARROW-1446) Python: Writing more than 2^31 rows from pandas dataframe causes row count overflow error

Wes McKinney (JIRA) Wed, 06 Sep 2017 16:28:37 -0700

     [ 
https://issues.apache.org/jira/browse/ARROW-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wes McKinney resolved ARROW-1446.
---------------------------------
    Resolution: Fixed

Issue resolved by pull request 1055
[https://github.com/apache/arrow/pull/1055]

> Python: Writing more than 2^31 rows from pandas dataframe causes row count 
> overflow error
> -----------------------------------------------------------------------------------------
>
>                 Key: ARROW-1446
>                 URL: https://issues.apache.org/jira/browse/ARROW-1446
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.6.0
>            Reporter: James Porritt
>            Assignee: Wes McKinney
>             Fix For: 0.7.0
>
>
> I have the following code:
> {code}
> import pyarrow
> import pyarrow.parquet as pq
> client = pyarrow.HdfsClient("<host>", <port>, "<user>", driver='libhdfs3')
> abc_table = client.read_parquet('<source parquet>', nthreads=16)
> abc_df = abc_table.to_pandas()
> abc_table = pyarrow.Table.from_pandas(abc_df)
> with client.open('<target parquet>', 'wb') as f:
>     pq.write_table(abc_table, f)
> {code}
> <source parquet> contains 2497301128 rows.
> During the write however I get the following error:
> {format}
> Traceback (most recent call last):
>   File "pyarrow_cluster.py", line 29, in <module>
>     main()
>   File "pyarrow_cluster.py", line 26, in main
>     pq.write_table(nmi_table, f)
>   File "<home 
> dir>/miniconda2/envs/parquet/lib/python2.7/site-packages/pyarrow/parquet.py", 
> line 796, in write_table
>     writer.write_table(table, row_group_size=row_group_size)
>   File "_parquet.pyx", line 663, in pyarrow._parquet.ParquetWriter.write_table
>   File "error.pxi", line 72, in pyarrow.lib.check_status
> pyarrow.lib.ArrowIOError: Written rows: -1797666168 != expected rows: 
> 2497301128in the current column chunk
> {format}
> The number of written rows specified suggests a 32-bit signed integer has 
> overflowed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (ARROW-1446) Python: Writing more than 2^31 rows from pandas dataframe causes row count overflow error

Reply via email to