[
https://issues.apache.org/jira/browse/HIVE-13697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281695#comment-15281695
]
Oleksiy Sayankin edited comment on HIVE-13697 at 1/23/18 2:16 PM:
------------------------------------------------------------------
*ROOT-CAUSE:*
toLowerCase() operator is used while getting skewed values from AST Node in
BaseSemanticAnalyzer. Hence Skewed Values are stored lower case only.
{code:java}
hive> desc formatted testskew2;
OK
# col_name data_type comment
id int
a string
# Detailed Table Information
Database: default
Owner: hdfs
CreateTime: Thu May 12 18:37:20 EEST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs:/user/hive/warehouse/testskew2
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1463067440
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Stored As SubDirectories: Yes
Skewed Columns: [a]
Skewed Values: [[aus], [us]] <---- !!! ERROR !!!
Storage Desc Params:
serialization.format 1
{code}
*SOLUTION:*
Remove unnecessary toLowerCase() operator.
was (Author: osayankin):
ROOT-CAUSE:
toLowerCase() operator while getting skewed values from AST Node in
BaseSemanticAnalyzer. Hence Skewed Values are stored lower case only.
{code}
hive> desc formatted testskew2;
OK
# col_name data_type comment
id int
a string
# Detailed Table Information
Database: default
Owner: hdfs
CreateTime: Thu May 12 18:37:20 EEST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs:/user/hive/warehouse/testskew2
Table Type: MANAGED_TABLE
Table Parameters:
transient_lastDdlTime 1463067440
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat:
org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Stored As SubDirectories: Yes
Skewed Columns: [a]
Skewed Values: [[aus], [us]] <---- !!! ERROR !!!
Storage Desc Params:
serialization.format 1
{code}
SOLUTION:
Remove unnecessary toLowerCase() operator.
> ListBucketing feature does not support uppercase string.
> --------------------------------------------------------
>
> Key: HIVE-13697
> URL: https://issues.apache.org/jira/browse/HIVE-13697
> Project: Hive
> Issue Type: Bug
> Components: Database/Schema
> Affects Versions: 1.2.1
> Environment: 1.2.1
> Reporter: Hao Zhu
> Assignee: Oleksiy Sayankin
> Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-13697.1.patch
>
>
> This is the feature:
> https://cwiki.apache.org/confluence/display/Hive/ListBucketing
> 1. Good example:
> {code}
> CREATE TABLE testskew (id INT, a STRING)
> SKEWED BY (a) ON ('abc', 'xyz') STORED AS DIRECTORIES;
> set hive.mapred.supports.subdirectories=true;
> set mapred.input.dir.recursive=true;
> INSERT OVERWRITE TABLE testskew
> SELECT 123,'abc' FROM dual
> union all
> SELECT 123,'xyz' FROM dual
> union all
> SELECT 123,'others' FROM dual;
> {code}
> {code}
> # hadoop fs -ls /user/hive/warehouse/testskew
> Found 3 items
> drwxrwxrwx - mapr mapr 1 2016-05-05 14:56
> /user/hive/warehouse/testskew/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
> drwxrwxrwx - mapr mapr 1 2016-05-05 14:56
> /user/hive/warehouse/testskew/a=abc
> drwxrwxrwx - mapr mapr 1 2016-05-05 14:56
> /user/hive/warehouse/testskew/a=xyz
> {code}
> This is good, because both "abc" and "xyz" directories got created.
> 2. Bad example -- This is the issue
> {code}
> CREATE TABLE testskew2 (id INT, a STRING)
> SKEWED BY (a) ON ('aus', 'US') STORED AS DIRECTORIES;
> set hive.mapred.supports.subdirectories=true;
> set mapred.input.dir.recursive=true;
> INSERT OVERWRITE TABLE testskew2
> SELECT 123, 'aus' FROM dual
> union all
> SELECT 123, 'US' FROM dual
> union all
> SELECT 123, 'others' FROM dual;
> {code}
> You can see, only "aus" directory got created...
> {code}
> # hadoop fs -ls /user/hive/warehouse/testskew2
> Found 2 items
> drwxrwxrwx - mapr mapr 1 2016-05-05 15:11
> /user/hive/warehouse/testskew2/HIVE_DEFAULT_LIST_BUCKETING_DIR_NAME
> drwxrwxrwx - mapr mapr 1 2016-05-05 15:11
> /user/hive/warehouse/testskew2/a=aus
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)