[jira] [Created] (SPARK-6904) SparkSql - HiveContext - optimize reading partition data from metastore

Thomas Graves (JIRA) Tue, 14 Apr 2015 09:03:13 -0700

Thomas Graves created SPARK-6904:
------------------------------------

             Summary: SparkSql - HiveContext - optimize reading partition data 
from metastore
                 Key: SPARK-6904
                 URL: https://issues.apache.org/jira/browse/SPARK-6904
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.3.0
            Reporter: Thomas Graves




I was trying out spark sql using the HiveContext and doing a select on a 
partitioned table with lots of partitions (16,000+). It took over 6 minutes 
before it even started the job. It looks like it was querying the Hive 
metastore and got a good chunk of data back.  Which I'm guessing is info on the 
partitions.  Running the same query using hive takes 45 seconds for the entire 
job. 

It would be nice if we could optimize on the partitions when reading from the 
metastore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-6904) SparkSql - HiveContext - optimize reading partition data from metastore

Reply via email to