[jira] [Created] (SPARK-9272) Persist information of individual partitions when persisting partitioned data source tables to metastore

Cheng Lian (JIRA) Wed, 22 Jul 2015 23:20:13 -0700

Cheng Lian created SPARK-9272:
---------------------------------

             Summary: Persist information of individual partitions when 
persisting partitioned data source tables to metastore
                 Key: SPARK-9272
                 URL: https://issues.apache.org/jira/browse/SPARK-9272
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.5.0
            Reporter: Cheng Lian



Currently, when a partitioned data source table is persisted to Hive metastore, 
we only persist its partition columns. Information about individual partitions 
are not persisted. This forces us to do a partition discovery before reading a 
persisted partitioned table, which hurts performance.

To fix this issue, we may persist partition information into metastore. 
Specifically, the format should be compatible with Hive to ensure 
interoperability.

One of the approach to collect partition values and partition directory path 
for dynamicly partitioned tables is to use accumulators to collect expected 
information during the write job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-9272) Persist information of individual partitions when persisting partitioned data source tables to metastore

Reply via email to