[ 
https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated HIVE-14462:
------------------------------------
    Status: Open  (was: Patch Available)


In some corner cases, it is possible that partitions can have nested & multiple 
directories. (e.g table/ii=1/jj=15/q=10/r=20/s=30/000000_0, 
table/ii=1/jj=15/q=11/r=22/s=33/000000_0 where in ii and jj are the only 
partition columns).
{{HiveMetastoreChecker.getPartitionName}} ends up resolving partition names as 
"ii=1/jj=15/q=11/r=22/s=33" and "ii=1/jj=15/q=10/r=20/s=30".  
When msck is run, it would end up throwing duplicate partitions exception for 
ii=1, jj=15 in MS. msck falls back to {{msckAddPartitionsOneByOne}}, which 
tries to repair one partition at a time and ignores any exceptions. So job 
completes essentially, but ends up making lots of calls to MS and can be too 
slow. I will attach the latest patch in RB

Without Patch:
=============
msck runtime for 10000 partitions in small cluster: *370 seconds*

With Patch:
===========
msck runtime for 10000 partitions in small cluster: *62 seconds*

> Reduce number of partition check calls in add_partitions
> --------------------------------------------------------
>
>                 Key: HIVE-14462
>                 URL: https://issues.apache.org/jira/browse/HIVE-14462
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>            Priority: Minor
>         Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, 
> HIVE-14462.3.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to