[
https://issues.apache.org/jira/browse/ARROW-723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950610#comment-15950610
]
Jonathan Chambers commented on ARROW-723:
-----------------------------------------
I don't have time to explore this much unfortunately, but I can share a few
observations:
- When run in an interactive console, this bug completely freezes the process -
cannot exit with ctrl-c etc, need to kill the process.
- When run through Pycharm's interactive debugger, it gets to the write file
line and then freezes - no way to continue debugging, forced to kill debugger.
- A test.pq file is written that seems to increase in size indefinitely if you
let the process run.
- CPU usage is 100%
- These suggest that pyarrow get's stuck in an infinite loop somewhere in the C
part of the code.
Hope that helps
> Arrow freezes on write if chunk_size=0
> --------------------------------------
>
> Key: ARROW-723
> URL: https://issues.apache.org/jira/browse/ARROW-723
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.2.0
> Environment: Linux, macOS
> Reporter: Jonathan Chambers
>
> Pyarrow freezes if you set chunk_size=0 (e.g. if you forget to account for
> short data when setting chunk size as a function of table length, see
> example).
> Would expect either to handle gracefully (e.g. revert to behaviour
> chunk_size=None) or to throw error.
> ```
> import numpy as np
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> cols = 'A', 'B', 'C', 'D'
> row = np.arange(4)
> data = pd.DataFrame([row], columns=cols)
> table = pa.Table.from_pandas(data.reset_index(), timestamps_to_ms=True)
> pq.write_table(table, 'test.pq', chunk_size=int(len(data) / 4))
> ```
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)