Yash Datta created SPARK-6632:
---------------------------------
Summary: Optimize the parquetSchema to metastore schema
reconciliation, so that the process is delegated to each map task itself
Key: SPARK-6632
URL: https://issues.apache.org/jira/browse/SPARK-6632
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.3.0
Reporter: Yash Datta
Fix For: 1.4.0
Currently in ParquetRelation2, schema from all the part files is first merged,
and then reconciled with metastore schema. This approach does not scale in case
we have thousands of partitions for the table. We can take a different approach
where we can go ahead with the metastore schema, and reconcile the names of the
columns within each map task , using ReadSupport hooks provided in parquet.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]