[
https://issues.apache.org/jira/browse/ARROW-16564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17536738#comment-17536738
]
Aaron Philip commented on ARROW-16564:
--------------------------------------
If I use the new schema, is there any way for Pyarrow to ignore files that
don't conform to that schema?
> [Python] Add option to have dataset infer the parquet schema from the last
> file
> -------------------------------------------------------------------------------
>
> Key: ARROW-16564
> URL: https://issues.apache.org/jira/browse/ARROW-16564
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Python
> Affects Versions: 7.0.0, 8.0.0
> Reporter: Aaron Philip
> Priority: Minor
>
> According to
> [https://arrow.apache.org/docs/python/dataset.html#dataset-discovery],
> dataset will infer the schema for parquet based on the first file in the path.
> I have a situation where a column was added to the schema after a certain
> date. As a result, when I try to read the parquet in this path, the new
> column is ignored because it is not part of the schema of the first file in
> that path.
> I would like the option to infer the schema based on the last file in the
> path to avoid this issue.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)