[ https://issues.apache.org/jira/browse/HIVE-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Main updated HIVE-7578: ----------------------------- Description: Hive does not return the correct results when you run "max()" on a table that has been partitioned on a numeric column when more than 10 partitions are present and they are numbered 1 - 10. The same thing happens if the numbers are in the range of 1 - 100. It appears as if Hive is using a string-based sort algorithm instead of honoring the numeric data type and applying a true numeric sort. The steps to recreate this issue are included below. First, put the file below into HDFS at '/tmp/hive_bug/people_flat/people.dat'. (Leave lines with leading dashes out) ----- File contents ----- 1,David 2,Jeff 3,Cindy 4,Prakash 5,Kate 6,Chung 7,Ginny 8,Huy 9,Brett 10,Jennifer 11,Dan 12,Shivani 13,Nate ----- EOF ----- Now run these statements in either Hive or beeline: drop table if exists people_flat; create external table people_flat ( pnum int, pname string ) row format delimited fields terminated by ',' location '/tmp/hive_bug/people_flat'; select * from people_flat order by plum; -- order of records should be correct select max(pnum) from people_flat; -- 13 (this is correct) create table people_partitioned ( pname string ) partitioned by (pnum int) row format delimited fields terminated by ','; SET hive.exec.max.dynamic.partitions=10000; SET hive.exec.max.dynamic.partitions.pernode=10000; SET hive.exec.dynamic.partition.mode=nonstrict; insert into table people_partitioned partition (pnum) select pname, pnum from people_flat; select max(pnum) from people_partitioned; -- result 9 is wrong! select * from people_partitioned order by pnum; -- records not sorted correctly in numeric order -- Although this is not a solution, the next query yield correct results select max(pnum_int) from ( select cast(pnum as int) pnum_int from people_partitioned ) a; was: Hive does not return the correct results when you run "max()" on a table that has been partitioned on a numeric column when more than 10 partitions are present and they are numbered 1 - 10. The same thing happens if the numbers are in the range of 1 - 100. The steps to recreate this issue are included below. First, put the file below into HDFS at '/tmp/hive_bug/people_flat/people.dat'. (Leave lines with leading dashes out) ----- File contents ----- 1,David 2,Jeff 3,Cindy 4,Prakash 5,Kate 6,Chung 7,Ginny 8,Huy 9,Brett 10,Jennifer 11,Dan 12,Shivani 13,Nate ----- EOF ----- Now run these statements in either Hive or beeline: drop table if exists people_flat; create external table people_flat ( pnum int, pname string ) row format delimited fields terminated by ',' location '/tmp/hive_bug/people_flat'; select * from people_flat order by plum; -- order of records should be correct select max(pnum) from people_flat; -- 13 (this is correct) create table people_partitioned ( pname string ) partitioned by (pnum int) row format delimited fields terminated by ','; SET hive.exec.max.dynamic.partitions=10000; SET hive.exec.max.dynamic.partitions.pernode=10000; SET hive.exec.dynamic.partition.mode=nonstrict; insert into table people_partitioned partition (pnum) select pname, pnum from people_flat; select max(pnum) from people_partitioned; -- result 9 is wrong! select * from people_partitioned order by pnum; -- records not sorted correctly in numeric order -- Although this is not a solution, the next query yield correct results select max(pnum_int) from ( select cast(pnum as int) pnum_int from people_partitioned ) a; > Sorting and max() against numeric partition column does not work > ---------------------------------------------------------------- > > Key: HIVE-7578 > URL: https://issues.apache.org/jira/browse/HIVE-7578 > Project: Hive > Issue Type: Bug > Components: Query Processor > Affects Versions: 0.10.0, 0.12.0 > Environment: Tested in a linux environment > Reporter: David Main > > Hive does not return the correct results when you run "max()" on a table that > has been partitioned on a numeric column when more than 10 partitions are > present and they are numbered 1 - 10. The same thing happens if the numbers > are in the range of 1 - 100. It appears as if Hive is using a string-based > sort algorithm instead of honoring the numeric data type and applying a true > numeric sort. > The steps to recreate this issue are included below. > First, put the file below into HDFS at > '/tmp/hive_bug/people_flat/people.dat'. > (Leave lines with leading dashes out) > ----- File contents ----- > 1,David > 2,Jeff > 3,Cindy > 4,Prakash > 5,Kate > 6,Chung > 7,Ginny > 8,Huy > 9,Brett > 10,Jennifer > 11,Dan > 12,Shivani > 13,Nate > ----- EOF ----- > Now run these statements in either Hive or beeline: > drop table if exists people_flat; > create external table people_flat ( > pnum int, > pname string > ) > row format delimited fields terminated by ',' > location '/tmp/hive_bug/people_flat'; > select * from people_flat order by plum; > -- order of records should be correct > select max(pnum) from people_flat; > -- 13 (this is correct) > create table people_partitioned ( > pname string > ) > partitioned by (pnum int) > row format delimited fields terminated by ','; > SET hive.exec.max.dynamic.partitions=10000; > SET hive.exec.max.dynamic.partitions.pernode=10000; > SET hive.exec.dynamic.partition.mode=nonstrict; > insert into table people_partitioned > partition (pnum) > select pname, > pnum > from people_flat; > select max(pnum) from people_partitioned; > -- result 9 is wrong! > select * from people_partitioned order by pnum; > -- records not sorted correctly in numeric order > -- Although this is not a solution, the next query yield correct results > select max(pnum_int) > from ( > select cast(pnum as int) pnum_int > from people_partitioned > ) a; -- This message was sent by Atlassian JIRA (v6.2#6252)