[ 
https://issues.apache.org/jira/browse/SPARK-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-6795:
------------------------------
          Description: 
With the help of [Parquet MR PR 
#91|https://github.com/apache/incubator-parquet-mr/pull/91] which will be 
included in the official release of Parquet MR 1.6.0, now it's possible to 
avoid reading footers on the driver side completely when an global arbitrative 
schema is available.

Currently, the global schema can be either Hive metastore schema or specified 
via data sources DDL. All tasks should verify Parquet data files and reconcile 
possible schema conflicts locally against this global schema.

However, when no global schema is available and schema merging is enabled, we 
still need to read schemas from all data files to infer a valid global schema.
     Target Version/s: 1.4.0
    Affects Version/s: 1.3.1
                       1.1.1
                       1.2.1
             Assignee: Cheng Lian

> Avoid reading Parquet footers on driver side when an global arbitrative 
> schema is available
> -------------------------------------------------------------------------------------------
>
>                 Key: SPARK-6795
>                 URL: https://issues.apache.org/jira/browse/SPARK-6795
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.1
>            Reporter: Cheng Lian
>            Assignee: Cheng Lian
>            Priority: Critical
>
> With the help of [Parquet MR PR 
> #91|https://github.com/apache/incubator-parquet-mr/pull/91] which will be 
> included in the official release of Parquet MR 1.6.0, now it's possible to 
> avoid reading footers on the driver side completely when an global 
> arbitrative schema is available.
> Currently, the global schema can be either Hive metastore schema or specified 
> via data sources DDL. All tasks should verify Parquet data files and 
> reconcile possible schema conflicts locally against this global schema.
> However, when no global schema is available and schema merging is enabled, we 
> still need to read schemas from all data files to infer a valid global schema.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to