Ying Wang created ARROW-3208:
--------------------------------
Summary: Segmentation fault when reading a Parquet partitioned
dataset to a Parquet file
Key: ARROW-3208
URL: https://issues.apache.org/jira/browse/ARROW-3208
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.9.0
Environment: Ubuntu 16.04 LTS; System76 Oryx Pro
Reporter: Ying Wang
Steps to reproduce:
# Create a partitioned dataset with the following code:
```python
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
df = pd.DataFrame({ 'one': [-1, 10, 2.5, 100, 1000, 1, 29.2], 'two': [-1, 10,
2, 100, 1000, 1, 11], 'three': [0, 0, 0, 0, 0, 0, 0] })
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table, root_path='/home/yingw787/misc/example_dataset',
partition_cols=['one', 'two'])
```
# Create a Parquet file from a PyArrow Table created from the partitioned
Parquet dataset:
```python
import pyarrow.parquet as pq
table = pq.ParquetDataset('/path/to/dataset').read()
pq.write_table(table, '/path/to/example.parquet')
```
EXPECTED:
* Successful write
GOT:
* Segmentation fault
Issue reference on GitHub mirror: https://github.com/apache/arrow/issues/2511
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)