[
https://issues.apache.org/jira/browse/SPARK-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheng Lian reassigned SPARK-6795:
---------------------------------
Assignee: Cheng Lian
> Avoid reading Parquet footers on driver side when an global arbitrative
> schema is available
> -------------------------------------------------------------------------------------------
>
> Key: SPARK-6795
> URL: https://issues.apache.org/jira/browse/SPARK-6795
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.1
> Reporter: Cheng Lian
> Assignee: Cheng Lian
> Priority: Critical
>
> With the help of [Parquet MR PR
> #91|https://github.com/apache/incubator-parquet-mr/pull/91] which will be
> included in the official release of Parquet MR 1.6.0, now it's possible to
> avoid reading footers on the driver side completely when an global
> arbitrative schema is available.
> Currently, the global schema can be either Hive metastore schema or specified
> via data sources DDL. All tasks should verify Parquet data files and
> reconcile possible schema conflicts locally against this global schema.
> However, when no global schema is available and schema merging is enabled, we
> still need to read schemas from all data files to infer a valid global schema.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]