[ 
https://issues.apache.org/jira/browse/SPARK-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian reassigned SPARK-6795:
---------------------------------

    Assignee: Cheng Lian

> Avoid reading Parquet footers on driver side when an global arbitrative 
> schema is available
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6795
>                 URL: https://issues.apache.org/jira/browse/SPARK-6795
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.1
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>            Priority: Critical
>
> With the help of [Parquet MR PR 
> #91|https://github.com/apache/incubator-parquet-mr/pull/91] which will be 
> included in the official release of Parquet MR 1.6.0, now it's possible to 
> avoid reading footers on the driver side completely when an global 
> arbitrative schema is available.
> Currently, the global schema can be either Hive metastore schema or specified 
> via data sources DDL. All tasks should verify Parquet data files and 
> reconcile possible schema conflicts locally against this global schema.
> However, when no global schema is available and schema merging is enabled, we 
> still need to read schemas from all data files to infer a valid global schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to