Deepesh Khandelwal created HIVE-8062:
----------------------------------------

             Summary: Stats collection for columns fails on a partitioned table 
with null values in partitioning column
                 Key: HIVE-8062
                 URL: https://issues.apache.org/jira/browse/HIVE-8062
             Project: Hive
          Issue Type: Bug
          Components: Statistics
    Affects Versions: 0.14.0
            Reporter: Deepesh Khandelwal


Steps to reproduce:
1. Create a data file abc.txt with the following contents:
{noformat}
a,1
b,
{noformat}
2. Use the Hive CLI to create and load the partitioned table:
{noformat}
hive> create table abc(a string, b int);
OK
Time taken: 0.272 seconds
hive> load data local inpath 'abc.txt' into table abc;
Loading data to table default.abc
Table default.abc stats: [numFiles=1, numRows=0, totalSize=7, rawDataSize=0]
OK
Time taken: 0.463 seconds
hive> create table abc1(a string) partitioned by (b int);
OK
Time taken: 0.098 seconds
hive> set hive.exec.dynamic.partition.mode=nonstrict;
hive> insert overwrite table abc1 partition (b) select a, b from abc;
Query ID = hrt_qa_20140911210909_1200fae7-1e18-4e0d-b74f-040453c27cff
Total jobs = 1
Launching Job 1 out of 1


Status: Running (application id: Executing on YARN cluster with App id 
application_1410457588978_0063)

Map 1: -/-      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0(+1)/1  Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 0(+1)/1
Map 1: 1/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 1/1
Status: Finished successfully
Loading data to table default.abc1 partition (b=null)
        Loading partition {b=__HIVE_DEFAULT_PARTITION__}
Partition default.abc1{b=__HIVE_DEFAULT_PARTITION__} stats: [numFiles=1, 
numRows=2, totalSize=7, rawDataSize=5]
OK
Time taken: 7.49 seconds
{noformat}
3. Now run the analyze statistics command for columns:
{noformat}
hive> analyze table abc1 partition (b) compute statistics for columns;
Query ID = hrt_qa_20140911211010_440bdb4a-6a0d-496b-9d2e-5fc84db3d0ee
Total jobs = 1
Launching Job 1 out of 1


Status: Running (application id: Executing on YARN cluster with App id 
application_1410457588978_0063)

Map 1: 0(+1)/1  Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 0(+1)/1
Map 1: 1/1      Reducer 2: 1/1
Status: Finished successfully
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.ColumnStatsTask
{noformat}
The analyze statistics for columns fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to