[ 
https://issues.apache.org/jira/browse/HIVE-707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736695#action_12736695
 ] 

Namit Jain commented on HIVE-707:
---------------------------------

I agree 1. is a problem - we dont have any good way to handle the order by.

A workaround would be to first sort the results in a sub-query and then have 
the group_concat outside the sub-query. But, syntactially, it is more painful.
Is there a requirement for the ordering ?


Separator can be handled as a configuration parameter like the maximum length.

Other option is to treat group_concat() specially, all the way from the parser 
- I dont like it because it is kind of hacky.
I think, first we should find out whether anyone needs ORDER BY in the 
group_concat and then think about it.





> add group_concat
> ----------------
>
>                 Key: HIVE-707
>                 URL: https://issues.apache.org/jira/browse/HIVE-707
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Min Zhou
>
> Moving the discussion to a new jira:
> I've implemented group_cat() in a rush, and found something difficult to 
> slove:
> 1. function group_cat() has a internal order by clause, currently, we can't 
> implement such an aggregation in hive.
> 2. when the strings will be group concated are too large, in another words, 
> if data skew appears, there is often not enough memory to store such a big 
> result.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to