[ 
https://issues.apache.org/jira/browse/PARQUET-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14978699#comment-14978699
 ] 

Ryan Blue commented on PARQUET-241:
-----------------------------------

Building 1.7.0 shouldn't make a difference because this issue is still 
unresolved. There are specs for Parquet, but nothing that covers this behavior. 
The order of listStatus probably depends on the order files were created, like 
most file systems. This would only make it so that the order of footers is the 
same as the order of the file status array.

> ParquetInputFormat.getFooters() should return in the same order as what 
> listStatus() returns
> --------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-241
>                 URL: https://issues.apache.org/jira/browse/PARQUET-241
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Mingyu Kim
>
> Because of how the footer cache is implemented, getFooters() returns the 
> footers in a different order than what listStatus() returns.
> When I provided url 
> "hdfs://.../part-00001.parquet,hdfs://.../part-00002.parquet,hdfs://.../part-00003.parquet",
>  ParquetInputFormat.getSplits(), which internally calls getFooters(), 
> returned the splits in a wrong order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to