Davies Liu created SPARK-17063:
----------------------------------

             Summary: MSCK REPAIR TABLE is super slow with Hive metastore
                 Key: SPARK-17063
                 URL: https://issues.apache.org/jira/browse/SPARK-17063
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: Davies Liu
            Assignee: Davies Liu


When repair a table with thousands of partitions, it could take hundreds of 
seconds, Hive metastore can only add a few partitioins per seconds, because it 
will list all the files for each partition to gather the fast stats (number of 
files, total size of files).

We could improve this by listing the files in Spark in parallel, than sending 
the fast stats to Hive metastore to avoid this sequential listing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to