[ 
https://issues.apache.org/jira/browse/HIVE-16299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-16299:
---------------------------------------
    Attachment: HIVE-16299.02.patch

Updating the patch with a better implementation. The patch makes changes to the 
parallel file listing algorithm so that the directory structure which do not 
follow the partition key specs are not searched. This early exit strategy will 
also help improve query response time on slower filesystems like S3 and when 
partition directory structure does not conform to partition definitions. MSCK 
will throw exception or log a warning based on the value of 
{{hive.msck.path.validation}} configuration.

> MSCK REPAIR TABLE should enforce partition key order when adding unknown 
> partitions
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-16299
>                 URL: https://issues.apache.org/jira/browse/HIVE-16299
>             Project: Hive
>          Issue Type: Bug
>          Components: Metastore
>    Affects Versions: 2.2.0
>            Reporter: Dudu Markovitz
>            Assignee: Vihang Karajgaonkar
>            Priority: Minor
>         Attachments: HIVE-16299.01.patch, HIVE-16299.02.patch
>
>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java
> static String getPartitionName(Path tablePath, Path partitionPath, 
> Set<String> partCols)
> ------------------------------------------------------------------------------------
> MSCK REPAIR validates that any sub-directory is in the format col=val and 
> that there is indeed a partition column named "col".
> However, there is no validation of the partition column location and as a 
> result false partitions are being created and so are directories that match 
> those partitions. 
> e.g. 1
> hive> dfs -mkdir -p /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5;
> hive> create external table t (i int) partitioned by (a int,b int,c int) ;
> OK
> hive> msck repair table t;
> OK
> Partitions not in metastore:  t:a=1/a=2/a=3/b=4/c=5
> Repair: Added partition to metastore t:a=1/a=2/a=3/b=4/c=5
> Time taken: 0.563 seconds, Fetched: 2 row(s)
> hive> show partitions t;
> OK
> a=3/b=4/c=5
> hive> dfs -ls -R /user/hive/warehouse/t;
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2/a=3
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2/a=3/b=4
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=1/a=2/a=3/b=4/c=5
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=3
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=3/b=4
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:07 
> /user/hive/warehouse/t/a=3/b=4/c=5
> e.g. 2
> hive> dfs -mkdir -p /user/hive/warehouse/t/c=3/b=2/a=1;
> hive> create external table t (i int) partitioned by (a int,b int,c int);
> OK
> hive> msck repair table t;
> OK
> Partitions not in metastore:  t:c=3/b=2/a=1
> Repair: Added partition to metastore t:c=3/b=2/a=1
> Time taken: 0.512 seconds, Fetched: 2 row(s)
> hive> show partitions t;
> OK
> a=1/b=2/c=3
> hive> dfs -ls -R  /user/hive/warehouse/t;
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 
> /user/hive/warehouse/t/a=1
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 
> /user/hive/warehouse/t/a=1/b=2
> drwxrwxrwx   - cloudera supergroup          0 2017-03-26 13:13 
> /user/hive/warehouse/t/a=1/b=2/c=3
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 
> /user/hive/warehouse/t/c=3
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 
> /user/hive/warehouse/t/c=3/b=2
> drwxr-xr-x   - cloudera supergroup          0 2017-03-26 13:12 
> /user/hive/warehouse/t/c=3/b=2/a=1



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to