[jira] [Commented] (PARQUET-241) ParquetInputFormat.getFooters() should return in the same order as what listStatus() returns

Stavros Kontopoulos (JIRA) Tue, 27 Oct 2015 15:15:19 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14977266#comment-14977266
 ]


Stavros Kontopoulos commented on PARQUET-241:
---------------------------------------------

Does this affect actual data order shown on spark for example...eg. use 
collect() on an parquet file as dataframe ... Hdfs seems preserving the order 
of data...Is order of data out of the scope of parquet? 

> ParquetInputFormat.getFooters() should return in the same order as what 
> listStatus() returns
> --------------------------------------------------------------------------------------------
>
>                 Key: PARQUET-241
>                 URL: https://issues.apache.org/jira/browse/PARQUET-241
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.6.0
>            Reporter: Mingyu Kim
>
> Because of how the footer cache is implemented, getFooters() returns the 
> footers in a different order than what listStatus() returns.
> When I provided url 
> "hdfs://.../part-00001.parquet,hdfs://.../part-00002.parquet,hdfs://.../part-00003.parquet",
>  ParquetInputFormat.getSplits(), which internally calls getFooters(), 
> returned the splits in a wrong order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-241) ParquetInputFormat.getFooters() should return in the same order as what listStatus() returns

Reply via email to