Thomas Graves created SPARK-6904:
------------------------------------
Summary: SparkSql - HiveContext - optimize reading partition data
from metastore
Key: SPARK-6904
URL: https://issues.apache.org/jira/browse/SPARK-6904
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.3.0
Reporter: Thomas Graves
I was trying out spark sql using the HiveContext and doing a select on a
partitioned table with lots of partitions (16,000+). It took over 6 minutes
before it even started the job. It looks like it was querying the Hive
metastore and got a good chunk of data back. Which I'm guessing is info on the
partitions. Running the same query using hive takes 45 seconds for the entire
job.
It would be nice if we could optimize on the partitions when reading from the
metastore.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]