[
https://issues.apache.org/jira/browse/HIVE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736737#action_12736737
]
Ashish Thusoo commented on HIVE-707:
------------------------------------
Found the JIRA...
Since group by is done in the reducer you could just use the trick that is used
in
distribute by x sort by y
when we do MAP and REDUCE operators. By setting up reduce sink in a similar way
you would be able to ensure that each reducer gets the rows for a value of x
in the sorted order of y. You can look at how we generate plans for the
transform operator and use the same strategy in group by code.
That should work and of course in this case we have to turn of any map/side
aggregation?
> add group_concat
> ----------------
>
> Key: HIVE-707
> URL: https://issues.apache.org/jira/browse/HIVE-707
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Reporter: Namit Jain
> Assignee: Min Zhou
>
> Moving the discussion to a new jira:
> I've implemented group_cat() in a rush, and found something difficult to
> slove:
> 1. function group_cat() has a internal order by clause, currently, we can't
> implement such an aggregation in hive.
> 2. when the strings will be group concated are too large, in another words,
> if data skew appears, there is often not enough memory to store such a big
> result.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.