[ https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635362#comment-13635362 ]
Harish Butani commented on HIVE-4333: ------------------------------------- Attached a patch. The changes fall into these categories: - some queries had 'partition by p_mfgr order by p_mfgr' or just 'partition by p_mfgr'. In these cases rows within a partition are not coming in the same order as in hadoop 1. Changed to 'partition by p_mfgr order by p_name' - Manufacturer 1 has 2 rows with exactly the same data; so if we use a 'row based window' there are diffs between 1 & 2. Changed to using a 'range based window' - There are diffs because of precision. Some of the avg and sum functions are now wrapped in 'round' - Finally tests with the empty over() on fns that relied on order had to changed. For e.g. leadlag.q Query 8. I tried the following change: {noformat} select p_name, p_retailprice, lead(p_retailprice) over() as l1 , lag(p_retailprice) over() as l2 from (select p_name, p_retailprice from part where p_mfgr = 'Manufacturer#1' order by p_name, p_retailprice ) p; {noformat} The output in hadoop 1 is: {noformat} almond antique burnished rose metallic 1173.15 1173.15 NULL almond antique burnished rose metallic 1173.15 1753.76 1173.15 almond antique chartreuse lavender yellow 1753.76 1602.59 1173.15 almond antique salmon chartreuse burlywood 1602.59 1414.42 1753.76 almond aquamarine burnished black steel 1414.42 1632.66 1602.59 almond aquamarine pink moccasin thistle 1632.66 NULL 1414.42 {noformat} The input to lead and lag query is ordered on p_name and p_retailprice and is very small, just 6 rows(so only 1 mapper is involved) In 1.0 the rows are coming to the reducer in the same order as the input In hadoop 2.0 the result is: {noformat} almond aquamarine pink moccasin thistle 1632.66 1414.42 NULL almond aquamarine burnished black steel 1414.42 1602.59 1632.66 almond antique salmon chartreuse burlywood 1602.59 1753.76 1414.42 almond antique chartreuse lavender yellow 1753.76 1173.15 1602.59 almond antique burnished rose metallic 1173.15 1173.15 1753.76 almond antique burnished rose metallic 1173.15 NULL 1173.15 {noformat} Looks like the shuffle in 2.0 reorders the rows even in this case. > most windowing tests fail on hadoop 2 > ------------------------------------- > > Key: HIVE-4333 > URL: https://issues.apache.org/jira/browse/HIVE-4333 > Project: Hive > Issue Type: Bug > Reporter: Gunther Hagleitner > Assignee: Matthew Weaver > Attachments: HIVE-4333.1.patch.txt > > > Problem is different order of results on hadoop 2 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira