Robert Dailey created ARROW-1938:
------------------------------------
Summary: Error writing to partitioned dataset
Key: ARROW-1938
URL: https://issues.apache.org/jira/browse/ARROW-1938
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.8.0
Environment: Linux (Ubuntu 16.04)
Reporter: Robert Dailey
Attachments: pyarrow_dataset_error.png
I receive the following error after upgrading to pyarrow 0.8.0 when writing to
a dataset:
* ArrowIOError: Column 3 had 187374 while previous column had 10000
The command was:
write_table_values = {'row_group_size': 10000}
pq.write_to_dataset(pa.Table.from_pandas(df, preserve_index=True),
'/logs/parsed/test', partition_cols=['Product', 'year', 'month', 'day',
'hour'], **write_table_values)
This same command works in version 0.7.1. I am trying to troubleshoot the
problem but wanted to submit a ticket.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)