Deepesh Khandelwal created HIVE-8062:
----------------------------------------
Summary: Stats collection for columns fails on a partitioned table
with null values in partitioning column
Key: HIVE-8062
URL: https://issues.apache.org/jira/browse/HIVE-8062
Project: Hive
Issue Type: Bug
Components: Statistics
Affects Versions: 0.14.0
Reporter: Deepesh Khandelwal
Steps to reproduce:
1. Create a data file abc.txt with the following contents:
{noformat}
a,1
b,
{noformat}
2. Use the Hive CLI to create and load the partitioned table:
{noformat}
hive> create table abc(a string, b int);
OK
Time taken: 0.272 seconds
hive> load data local inpath 'abc.txt' into table abc;
Loading data to table default.abc
Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0]
OK
Time taken: 0.463 seconds
hive> create table abc1(a string) partitioned by (b int);
OK
Time taken: 0.098 seconds
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> insert overwrite table abc1 partition (b) select a, b from abc;
Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff
Total jobs = 1
Launching Job 1 out of 1
Status: Running (application id: Executing on YARN cluster with App id
application_1410457588978_0063)
Map 1: -/- Reducer 2: 0/1
Map 1: 0/1 Reducer 2: 0/1
Map 1: 0(+1)/1 Reducer 2: 0/1
Map 1: 1/1 Reducer 2: 0(+1)/1
Map 1: 1/1 Reducer 2: 0/1
Map 1: 1/1 Reducer 2: 1/1
Status: Finished successfully
Loading data to table default.abc1 partition (b=null)
Loading partition {b=__HIVE_DEFAULT_PARTITION__}
Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1,
numRows=2, totalSize=7, rawDataSize=5]
OK
Time taken: 7.49 seconds
{noformat}
3. Now run the analyze statistics command for columns:
{noformat}
hive> analyze table abc1 partition (b) compute statistics for columns;
Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee
Total jobs = 1
Launching Job 1 out of 1
Status: Running (application id: Executing on YARN cluster with App id
application_1410457588978_0063)
Map 1: 0(+1)/1 Reducer 2: 0/1
Map 1: 1/1 Reducer 2: 0(+1)/1
Map 1: 1/1 Reducer 2: 1/1
Status: Finished successfully
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.ColumnStatsTask
{noformat}
The analyze statistics for columns fails.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)