[ 
https://issues.apache.org/jira/browse/HIVE-16299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15949890#comment-15949890
 ] 

Vihang Karajgaonkar commented on HIVE-16299:
--------------------------------------------

Thats a good point. I think we can use this information to exit early during 
the listing phase itself. If there are invalid partition directories, we don't 
need to list them and throw error or skip based on the value of 
{{hive.msck.path.validation}}.

> MSCK REPAIR TABLE should enforce partition key order when adding unknown 
> partitions
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-16299
>                 URL: https://issues.apache.org/jira/browse/HIVE-16299
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 2.2.0
>            Reporter: Dudu Markovitz
>            Assignee: Vihang Karajgaonkar
>            Priority: Minor
>         Attachments: HIVE-16299.01.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java
> static String getPartitionName(Path tablePath, Path partitionPath, 
> Set<String> partCols)
> ------------------------------------------------------------------------------------
> MSCK REPAIR validates that any sub-directory is in the format col=val and 
> that there is indeed a partition column named "col".
> However, there is no validation of the partition column location and as a 
> result false partitions are being created and so are directories that match 
> those partitions. 
> e.g. 1
> hive> dfs -mkdir -p /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5;
> hive> create external table t (i int) partitioned by (a int,b int,c int) ;
> OK
> hive> msck repair table t;
> OK
> Partitions not in metastore:  t:a=1/a=2/a=3/b=4/c=5
> Repair: Added partition to metastore t:a=1/a=2/a=3/b=4/c=5
> Time taken: 0.563 seconds, Fetched: 2 row(s)
> hive> show partitions t;
> OK
> a=3/b=4/c=5
> hive> dfs -ls -R /user/hive/warehouse/t;
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2/a=3
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2/a=3/b=4
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=3
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=3/b=4
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=3/b=4/c=5
> e.g. 2
> hive> dfs -mkdir -p /user/hive/warehouse/t/c=3/b=2/a=1;
> hive> create external table t (i int) partitioned by (a int,b int,c int);
> OK
> hive> msck repair table t;
> OK
> Partitions not in metastore:  t:c=3/b=2/a=1
> Repair: Added partition to metastore t:c=3/b=2/a=1
> Time taken: 0.512 seconds, Fetched: 2 row(s)
> hive> show partitions t;
> OK
> a=1/b=2/c=3
> hive> dfs -ls -R  /user/hive/warehouse/t;
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 
> /user/hive/warehouse/t/a=1
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 
> /user/hive/warehouse/t/a=1/b=2
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 
> /user/hive/warehouse/t/a=1/b=2/c=3
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 
> /user/hive/warehouse/t/c=3
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 
> /user/hive/warehouse/t/c=3/b=2
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 
> /user/hive/warehouse/t/c=3/b=2/a=1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to