[ 
https://issues.apache.org/jira/browse/HIVE-222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashish Thusoo updated HIVE-222:
-------------------------------

    Attachment: patch-222.txt

Fix for the bug.

There was a bug in way the the aggregation list was being generated for the map 
side aggregation. As a result the ordering of the aggregations in the map side 
groupby operator and the reduce side groupby operator would differ leading to 
this problem. Ideally, we should be using the row schema information to 
generate the order but that needs a much larger refactor of  how we generate 
plans in the group by case. For now this patch should fix the problem.

There are prexisting tests that test this (groupby2_map.q and groupby3_map.q). 
The test case however relies on an internal hashmap giving the keys in a 
certain order. The bug was easily reproducible with the patch in HIVE-179. I 
have tested it with that patch.


> Group by on a combination of disitinct and non distinct aggregates can return 
> serialization errors with map side aggregations.
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-222
>                 URL: https://issues.apache.org/jira/browse/HIVE-222
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.2.0
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>            Priority: Blocker
>             Fix For: 0.2.0
>
>         Attachments: patch-222.txt
>
>
> For queries of the form (groupby2_map.q in the source)
> SELECT x, count(DISTINCT y), SUM(y) FROM t GROUP BY x
> when map side aggregation is on 
> hive.map.aggr=true (This is off by default)
> The following exception can occur:
>     [junit] Caused by: java.lang.ClassCastException: java.lang.Long cannot be 
> cast to java.lang.Double
>     [junit]     at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeTypeDouble.serialize(DynamicSerDeTypeDouble.java:60)
>     [junit]     at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeFieldList.serialize(DynamicSerDeFieldList.java:235)
>     [junit]     at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDeStructBase.serialize(DynamicSerDeStructBase.java:81)
>     [junit]     at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.serialize(DynamicSerDe.java:174)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to