Florian Jetter created ARROW-4267: ------------------------------------- Summary: [Python/C++] Segfault when reading rowgroups with duplicated columns Key: ARROW-4267 URL: https://issues.apache.org/jira/browse/ARROW-4267 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.11.1 Reporter: Florian Jetter
When reading a row group using duplicated columns I receive a segfault. {code:python} import pandas as pd import pyarrow as pa import pyarrow.parquet as pq df = pd.DataFrame({ "col": ["A", "B"] }) table = pa.Table.from_pandas(df) buf = pa.BufferOutputStream() pq.write_table(table, buf) parquet_file = pq.ParquetFile(buf.getvalue()) parquet_file.read_row_group(0) parquet_file.read_row_group(0, columns=["col"]) # boom parquet_file.read_row_group(0, columns=["col", "col"]) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)