Davies Liu created SPARK-17063:
----------------------------------
Summary: MSCK REPAIR TABLE is super slow with Hive metastore
Key: SPARK-17063
URL: https://issues.apache.org/jira/browse/SPARK-17063
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu
When repair a table with thousands of partitions, it could take hundreds of
seconds, Hive metastore can only add a few partitioins per seconds, because it
will list all the files for each partition to gather the fast stats (number of
files, total size of files).
We could improve this by listing the files in Spark in parallel, than sending
the fast stats to Hive metastore to avoid this sequential listing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]