[jira] [Created] (SPARK-6632) Optimize the parquetSchema to metastore schema reconciliation, so that the process is delegated to each map task itself

Yash Datta (JIRA) Tue, 31 Mar 2015 06:24:08 -0700

Yash Datta created SPARK-6632:
---------------------------------

             Summary: Optimize the parquetSchema to metastore schema 
reconciliation, so that the process is delegated to each map task itself
                 Key: SPARK-6632
                 URL: https://issues.apache.org/jira/browse/SPARK-6632
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.3.0
            Reporter: Yash Datta
             Fix For: 1.4.0



Currently in ParquetRelation2, schema from all the part files is first merged, 
and then reconciled with metastore schema. This approach does not scale in case 
we have thousands of partitions for the table. We can take a different approach 
where we can go ahead with the metastore schema, and reconcile the names of the 
columns within each map task , using ReadSupport hooks provided in parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-6632) Optimize the parquetSchema to metastore schema reconciliation, so that the process is delegated to each map task itself

Reply via email to