[
https://issues.apache.org/jira/browse/HIVE-14511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15421763#comment-15421763
]
Subramanyam Pattipaka edited comment on HIVE-14511 at 8/15/16 10:07 PM:
------------------------------------------------------------------------
[~sershe], Even if we introduce another command to be flexible to cater this
scenario, what if the user data has changed in terms of directory structure.
Why does the user has to recreate all tables again? Why not repair table is
also flexible (with this patch) such that configs mapred.input.dir.recursive
and hive.mapred.supports.subdirectories are supported add relevant partitions.
Further having two commands may be confusing.
I don't mean to add file here a=1/000000_0 f. I mean only to ignore these and
list them in error log if a config is enabled such that users can act on them.
Error is better instead of debug. This way, all configurations would give these
details. For example if we have following files
tbldir/a=1/file1.txt
tbldir/a=2/b=1/file2.txt
tbldir/a=2/b=1/c=1/file3.txt
and we are trying to create partitioned table with partitions on a and b with
root directory tbldir
Here ERROR log would say ignoring file tbldir/a=1/file1.txt due to incorrect
structure if ignore config is set. Otherwise, operation is failed.
We add only one partition with values (2, 1).
msck is still restrict and the ask here is to support configs
mapred.input.dir.recursive and hive.mapred.supports.subdirectories.
was (Author: pattipaka):
[~sershe], Even if we introduce another command to be flexible to cater this
scenario, what if the user data has changed in terms of directory structure.
Why does the user has to recreate all tables again? Why not repair table is
also flexible (with this patch) such that configs mapred.input.dir.recursive
and hive.mapred.supports.subdirectories are supported add relevant partitions.
Further having two commands may be confusing.
I don't mean to add file here a=1/000000_0 f. I mean only to ignore these and
list them in error log if a config is enabled such that users can act on them.
Error is better instead of debug. This way, all configurations would give these
details. For example if we have following files
tbldir/a=1/file1.txt
tbldir/a=2/b=1/file2.txt
and we are trying to create partitioned table with partitions on a and b with
root directory tbldir
Here ERROR log would say ignoring file tbldir/a=1/file1.txt due to incorrect
structure if ignore config is set. Otherwise, operation is failed.
We add only one partition with values (2, 1).
msck is still restrict and the ask here is to support configs
mapred.input.dir.recursive and hive.mapred.supports.subdirectories.
> Improve MSCK for partitioned table to deal with special cases
> -------------------------------------------------------------
>
> Key: HIVE-14511
> URL: https://issues.apache.org/jira/browse/HIVE-14511
> Project: Hive
> Issue Type: Sub-task
> Reporter: Pengcheng Xiong
> Assignee: Pengcheng Xiong
> Attachments: HIVE-14511.01.patch
>
>
> Some users will have a folder rather than a file under the last partition
> folder. However, msck is going to search for the leaf folder rather than the
> last partition folder. We need to improve that.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)