[
https://issues.apache.org/jira/browse/PARQUET-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977329#comment-14977329
]
Ryan Blue commented on PARQUET-241:
-----------------------------------
[~skonto], I think that most formats are consistent by accident, but that
consistency isn't guaranteed. This would probably make the collect result in
Spark more consistent.
> ParquetInputFormat.getFooters() should return in the same order as what
> listStatus() returns
> --------------------------------------------------------------------------------------------
>
> Key: PARQUET-241
> URL: https://issues.apache.org/jira/browse/PARQUET-241
> Project: Parquet
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Mingyu Kim
>
> Because of how the footer cache is implemented, getFooters() returns the
> footers in a different order than what listStatus() returns.
> When I provided url
> "hdfs://.../part-00001.parquet,hdfs://.../part-00002.parquet,hdfs://.../part-00003.parquet",
> ParquetInputFormat.getSplits(), which internally calls getFooters(),
> returned the splits in a wrong order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)