[ https://issues.apache.org/jira/browse/HIVE-14462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan updated HIVE-14462: ------------------------------------ Status: Open (was: Patch Available) In some corner cases, it is possible that partitions can have nested & multiple directories. (e.g table/ii=1/jj=15/q=10/r=20/s=30/000000_0, table/ii=1/jj=15/q=11/r=22/s=33/000000_0 where in ii and jj are the only partition columns). {{HiveMetastoreChecker.getPartitionName}} ends up resolving partition names as "ii=1/jj=15/q=11/r=22/s=33" and "ii=1/jj=15/q=10/r=20/s=30". When msck is run, it would end up throwing duplicate partitions exception for ii=1, jj=15 in MS. msck falls back to {{msckAddPartitionsOneByOne}}, which tries to repair one partition at a time and ignores any exceptions. So job completes essentially, but ends up making lots of calls to MS and can be too slow. I will attach the latest patch in RB Without Patch: ============= msck runtime for 10000 partitions in small cluster: *370 seconds* With Patch: =========== msck runtime for 10000 partitions in small cluster: *62 seconds* > Reduce number of partition check calls in add_partitions > -------------------------------------------------------- > > Key: HIVE-14462 > URL: https://issues.apache.org/jira/browse/HIVE-14462 > Project: Hive > Issue Type: Improvement > Components: Metastore > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Minor > Attachments: HIVE-14462.1.patch, HIVE-14462.2.patch, > HIVE-14462.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)