[
https://issues.apache.org/jira/browse/ARROW-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071140#comment-17071140
]
Joris Van den Bossche commented on ARROW-8276:
----------------------------------------------
Reproducer in python:
{code}
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
import pathlib
# create small partitioned dataset
table = pa.table({'col1': [1, 2, 3]})
basedir = pathlib.Path(".")
dataset_dir = basedir / "test_partitioned_fragment"
dataset_dir.mkdir(exist_ok=True)
(dataset_dir / "A=0").mkdir(exist_ok=True)
(dataset_dir / "A=1").mkdir(exist_ok=True)
pq.write_table(table, dataset_dir / "A=0" / "data.parquet")
pq.write_table(table, dataset_dir / "A=1" / "data.parquet")
# read it with the datasets API
dataset = ds.dataset(str(dataset_dir), format="parquet", partitioning="hive")
dataset.schema
dataset.to_table()
# reading one fragment fails
fragments = list(dataset.get_fragments())
fragments[0].to_table()
{code}
> [C++][Dataset] Scanning a Fragment does not take into account the partition
> columns
> -----------------------------------------------------------------------------------
>
> Key: ARROW-8276
> URL: https://issues.apache.org/jira/browse/ARROW-8276
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, C++ - Dataset
> Reporter: Joris Van den Bossche
> Assignee: Ben Kietzman
> Priority: Major
> Fix For: 0.17.0
>
>
> Follow-up on ARROW-8061, the {{to_table}} method doesn't work for fragments
> created from a partitioned dataset.
> (will add a reproducer later)
> cc [~bkietz]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)