[ 
https://issues.apache.org/jira/browse/HIVE-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Main updated HIVE-7578:
-----------------------------

    Description: 
Hive does not return the correct results when you run "max()" on a table that 
has been partitioned on a numeric column when more than 10 partitions are 
present and they are numbered 1 - 10.  The same thing happens if the numbers 
are in the range of 1 - 100.  It appears as if Hive is using a string-based 
sort algorithm instead of honoring the numeric data type and applying a true 
numeric sort.

The steps to recreate this issue are included below.

First, put the file below into HDFS at '/tmp/hive_bug/people_flat/people.dat'.  
(Leave lines with leading dashes out)

----- File contents -----
1,David
2,Jeff
3,Cindy
4,Prakash
5,Kate
6,Chung
7,Ginny
8,Huy
9,Brett
10,Jennifer
11,Dan
12,Shivani
13,Nate
----- EOF -----

Now run these statements in either Hive or beeline:

drop table if exists people_flat;

create external table people_flat (
    pnum int, 
    pname string
)
row format delimited fields terminated by ','
location '/tmp/hive_bug/people_flat';

select * from people_flat order by plum;
-- order of records should be correct

select max(pnum) from people_flat;
-- 13 (this is correct)

create table people_partitioned (
    pname string
)
partitioned by (pnum int)
row format delimited fields terminated by ',';

SET hive.exec.max.dynamic.partitions=10000;
SET hive.exec.max.dynamic.partitions.pernode=10000;
SET hive.exec.dynamic.partition.mode=nonstrict;

insert into table people_partitioned
partition (pnum)
select pname, 
    pnum
from people_flat;

select max(pnum) from people_partitioned;
-- result 9 is wrong!

select * from people_partitioned order by pnum;
-- records not sorted correctly in numeric order

-- Although this is not a solution, the next query yield correct results

select max(pnum_int)
from (
    select cast(pnum as int) pnum_int
    from people_partitioned
) a;



  was:
Hive does not return the correct results when you run "max()" on a table that 
has been partitioned on a numeric column when more than 10 partitions are 
present and they are numbered 1 - 10.  The same thing happens if the numbers 
are in the range of 1 - 100.

The steps to recreate this issue are included below.

First, put the file below into HDFS at '/tmp/hive_bug/people_flat/people.dat'.  
(Leave lines with leading dashes out)

----- File contents -----
1,David
2,Jeff
3,Cindy
4,Prakash
5,Kate
6,Chung
7,Ginny
8,Huy
9,Brett
10,Jennifer
11,Dan
12,Shivani
13,Nate
----- EOF -----

Now run these statements in either Hive or beeline:

drop table if exists people_flat;

create external table people_flat (
    pnum int, 
    pname string
)
row format delimited fields terminated by ','
location '/tmp/hive_bug/people_flat';

select * from people_flat order by plum;
-- order of records should be correct

select max(pnum) from people_flat;
-- 13 (this is correct)

create table people_partitioned (
    pname string
)
partitioned by (pnum int)
row format delimited fields terminated by ',';

SET hive.exec.max.dynamic.partitions=10000;
SET hive.exec.max.dynamic.partitions.pernode=10000;
SET hive.exec.dynamic.partition.mode=nonstrict;

insert into table people_partitioned
partition (pnum)
select pname, 
    pnum
from people_flat;

select max(pnum) from people_partitioned;
-- result 9 is wrong!

select * from people_partitioned order by pnum;
-- records not sorted correctly in numeric order

-- Although this is not a solution, the next query yield correct results

select max(pnum_int)
from (
    select cast(pnum as int) pnum_int
    from people_partitioned
) a;




> Sorting and max() against numeric partition column does not work
> ----------------------------------------------------------------
>
>                 Key: HIVE-7578
>                 URL: https://issues.apache.org/jira/browse/HIVE-7578
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.10.0, 0.12.0
>         Environment: Tested in a linux environment
>            Reporter: David Main
>
> Hive does not return the correct results when you run "max()" on a table that 
> has been partitioned on a numeric column when more than 10 partitions are 
> present and they are numbered 1 - 10.  The same thing happens if the numbers 
> are in the range of 1 - 100.  It appears as if Hive is using a string-based 
> sort algorithm instead of honoring the numeric data type and applying a true 
> numeric sort.
> The steps to recreate this issue are included below.
> First, put the file below into HDFS at 
> '/tmp/hive_bug/people_flat/people.dat'.  
> (Leave lines with leading dashes out)
> ----- File contents -----
> 1,David
> 2,Jeff
> 3,Cindy
> 4,Prakash
> 5,Kate
> 6,Chung
> 7,Ginny
> 8,Huy
> 9,Brett
> 10,Jennifer
> 11,Dan
> 12,Shivani
> 13,Nate
> ----- EOF -----
> Now run these statements in either Hive or beeline:
> drop table if exists people_flat;
> create external table people_flat (
>     pnum int, 
>     pname string
> )
> row format delimited fields terminated by ','
> location '/tmp/hive_bug/people_flat';
> select * from people_flat order by plum;
> -- order of records should be correct
> select max(pnum) from people_flat;
> -- 13 (this is correct)
> create table people_partitioned (
>     pname string
> )
> partitioned by (pnum int)
> row format delimited fields terminated by ',';
> SET hive.exec.max.dynamic.partitions=10000;
> SET hive.exec.max.dynamic.partitions.pernode=10000;
> SET hive.exec.dynamic.partition.mode=nonstrict;
> insert into table people_partitioned
> partition (pnum)
> select pname, 
>     pnum
> from people_flat;
> select max(pnum) from people_partitioned;
> -- result 9 is wrong!
> select * from people_partitioned order by pnum;
> -- records not sorted correctly in numeric order
> -- Although this is not a solution, the next query yield correct results
> select max(pnum_int)
> from (
>     select cast(pnum as int) pnum_int
>     from people_partitioned
> ) a;



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to