[ https://issues.apache.org/jira/browse/HIVE-16299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vihang Karajgaonkar updated HIVE-16299: --------------------------------------- Attachment: HIVE-16299.02.patch Updating the patch with a better implementation. The patch makes changes to the parallel file listing algorithm so that the directory structure which do not follow the partition key specs are not searched. This early exit strategy will also help improve query response time on slower filesystems like S3 and when partition directory structure does not conform to partition definitions. MSCK will throw exception or log a warning based on the value of {{hive.msck.path.validation}} configuration. > MSCK REPAIR TABLE should enforce partition key order when adding unknown > partitions > ----------------------------------------------------------------------------------- > > Key: HIVE-16299 > URL: https://issues.apache.org/jira/browse/HIVE-16299 > Project: Hive > Issue Type: Bug > Components: Metastore > Affects Versions: 2.2.0 > Reporter: Dudu Markovitz > Assignee: Vihang Karajgaonkar > Priority: Minor > Attachments: HIVE-16299.01.patch, HIVE-16299.02.patch > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java > static String getPartitionName(Path tablePath, Path partitionPath, > Set<String> partCols) > ------------------------------------------------------------------------------------ > MSCK REPAIR validates that any sub-directory is in the format col=val and > that there is indeed a partition column named "col". > However, there is no validation of the partition column location and as a > result false partitions are being created and so are directories that match > those partitions. > e.g. 1 > hive> dfs -mkdir -p /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5; > hive> create external table t (i int) partitioned by (a int,b int,c int) ; > OK > hive> msck repair table t; > OK > Partitions not in metastore: t:a=1/a=2/a=3/b=4/c=5 > Repair: Added partition to metastore t:a=1/a=2/a=3/b=4/c=5 > Time taken: 0.563 seconds, Fetched: 2 row(s) > hive> show partitions t; > OK > a=3/b=4/c=5 > hive> dfs -ls -R /user/hive/warehouse/t; > drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 > /user/hive/warehouse/t/a=1 > drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 > /user/hive/warehouse/t/a=1/a=2 > drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 > /user/hive/warehouse/t/a=1/a=2/a=3 > drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 > /user/hive/warehouse/t/a=1/a=2/a=3/b=4 > drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:07 > /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5 > drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:07 > /user/hive/warehouse/t/a=3 > drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:07 > /user/hive/warehouse/t/a=3/b=4 > drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:07 > /user/hive/warehouse/t/a=3/b=4/c=5 > e.g. 2 > hive> dfs -mkdir -p /user/hive/warehouse/t/c=3/b=2/a=1; > hive> create external table t (i int) partitioned by (a int,b int,c int); > OK > hive> msck repair table t; > OK > Partitions not in metastore: t:c=3/b=2/a=1 > Repair: Added partition to metastore t:c=3/b=2/a=1 > Time taken: 0.512 seconds, Fetched: 2 row(s) > hive> show partitions t; > OK > a=1/b=2/c=3 > hive> dfs -ls -R /user/hive/warehouse/t; > drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:13 > /user/hive/warehouse/t/a=1 > drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:13 > /user/hive/warehouse/t/a=1/b=2 > drwxrwxrwx - cloudera supergroup 0 2017-03-26 13:13 > /user/hive/warehouse/t/a=1/b=2/c=3 > drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:12 > /user/hive/warehouse/t/c=3 > drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:12 > /user/hive/warehouse/t/c=3/b=2 > drwxr-xr-x - cloudera supergroup 0 2017-03-26 13:12 > /user/hive/warehouse/t/c=3/b=2/a=1 -- This message was sent by Atlassian JIRA (v6.3.15#6346)