[GitHub] [arrow] yingw787 opened issue #2511: Segmentation fault when reading a Parquet partitioned dataset to a Parquet file

GitHub Tue, 04 Sep 2018 12:40:24 -0700

Steps to reproduce:

1. Create a partitioned dataset with the following code:


```python
import numpy as np
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

df = pd.DataFrame({
    'one': [-1, 10, 2.5, 100, 1000, 1, 29.2],
    'two': [-1, 10, 2, 100, 1000, 1, 11],
    'three': [0, 0, 0, 0, 0, 0, 0]
})
table = pa.Table.from_pandas(df)
pq.write_to_dataset(table, root_path='/home/yingw787/misc/example_dataset', 
partition_cols=['one', 'two'])
```

2. Create a Parquet file from a PyArrow Table created from the partitioned 
Parquet dataset:

```python
import pyarrow.parquet as pq

table = pq.ParquetDataset('/path/to/dataset').read()
pq.write_table(table, '/path/to/example.parquet')
```

EXPECTED:
- Successful write

GOT:
- Segmentation fault

[ Full content available at: https://github.com/apache/arrow/issues/2511 ]
This message was relayed via gitbox.apache.org for [email protected]

[GitHub] [arrow] yingw787 opened issue #2511: Segmentation fault when reading a Parquet partitioned dataset to a Parquet file

Reply via email to