[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

Harish Butani (JIRA) Thu, 18 Apr 2013 10:19:20 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-4333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635362#comment-13635362
 ]


Harish Butani commented on HIVE-4333:
-------------------------------------

Attached a patch. The changes fall into these categories:

- some queries had 'partition by p_mfgr order by p_mfgr' or just 'partition by 
p_mfgr'. In these cases rows within a partition are not coming in the same 
order as in hadoop 1. Changed to 'partition by p_mfgr order by p_name'
- Manufacturer 1 has 2 rows with exactly the same data; so if we use a 'row 
based window' there are diffs between 1 & 2. Changed to using a 'range based 
window'
- There are diffs because of precision. Some of the avg and sum functions are 
now wrapped in 'round'
- Finally tests with the empty over() on fns that relied on order had to 
changed. 
For e.g. leadlag.q Query 8. I tried the following change:
{noformat}
select p_name, p_retailprice,
lead(p_retailprice) over() as l1 ,
lag(p_retailprice)  over() as l2
from (select p_name, p_retailprice from part where p_mfgr = 'Manufacturer#1' 
order by p_name, p_retailprice ) p;
{noformat}

The output in hadoop 1 is:
{noformat}
almond antique burnished rose metallic  1173.15 1173.15 NULL
almond antique burnished rose metallic  1173.15 1753.76 1173.15
almond antique chartreuse lavender yellow       1753.76 1602.59 1173.15
almond antique salmon chartreuse burlywood      1602.59 1414.42 1753.76
almond aquamarine burnished black steel 1414.42 1632.66 1602.59
almond aquamarine pink moccasin thistle 1632.66 NULL    1414.42
{noformat}

The input to lead and lag query is ordered on p_name and p_retailprice and is 
very small, just 6 rows(so only 1 mapper is involved) In 1.0 the rows are 
coming to the reducer in the same order as the input


In hadoop 2.0 the result is:
{noformat}
almond aquamarine pink moccasin thistle 1632.66 1414.42 NULL
almond aquamarine burnished black steel 1414.42 1602.59 1632.66
almond antique salmon chartreuse burlywood      1602.59 1753.76 1414.42
almond antique chartreuse lavender yellow       1753.76 1173.15 1602.59
almond antique burnished rose metallic  1173.15 1173.15 1753.76
almond antique burnished rose metallic  1173.15 NULL    1173.15
{noformat}

Looks like the shuffle in 2.0 reorders the rows even in this case. 
                
> most windowing tests fail on hadoop 2
> -------------------------------------
>
>                 Key: HIVE-4333
>                 URL: https://issues.apache.org/jira/browse/HIVE-4333
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gunther Hagleitner
>            Assignee: Matthew Weaver
>         Attachments: HIVE-4333.1.patch.txt
>
>
> Problem is different order of results on hadoop 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4333) most windowing tests fail on hadoop 2

Reply via email to