[
https://issues.apache.org/jira/browse/PARQUET-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977266#comment-14977266
]
Stavros Kontopoulos commented on PARQUET-241:
---------------------------------------------
Does this affect actual data order shown on spark for example...eg. use
collect() on an parquet file as dataframe ... Hdfs seems preserving the order
of data...Is order of data out of the scope of parquet?
> ParquetInputFormat.getFooters() should return in the same order as what
> listStatus() returns
> --------------------------------------------------------------------------------------------
>
> Key: PARQUET-241
> URL: https://issues.apache.org/jira/browse/PARQUET-241
> Project: Parquet
> Issue Type: Bug
> Affects Versions: 1.6.0
> Reporter: Mingyu Kim
>
> Because of how the footer cache is implemented, getFooters() returns the
> footers in a different order than what listStatus() returns.
> When I provided url
> "hdfs://.../part-00001.parquet,hdfs://.../part-00002.parquet,hdfs://.../part-00003.parquet",
> ParquetInputFormat.getSplits(), which internally calls getFooters(),
> returned the splits in a wrong order.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)