[jira] [Commented] (HIVE-15580) Replace Spark's groupByKey operator with something with bounded memory

Ferdinand Xu (JIRA) Tue, 17 Jan 2017 20:05:13 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827399#comment-15827399
 ]


Ferdinand Xu commented on HIVE-15580:
-------------------------------------

Hi [~xuefuz], the main change is about replacing *groupByKey* with 
*repartitionAndSortWithinPartitions*. Just help me to have a better understand. 
Before this patch:
e.g. GroupByShuffle will lead to the following result:
K1 -> iterator of {V11,V12,V13...}
K2 -> iterator of {V21,V22,V23...}
...

With this patch:
K1 -> V11
K1 -> V12
K1 -> V13
...
K2 -> V21
...

And we process them one by one without fetching the value from iterator. If so, 
is there any side effect by changing this?


> Replace Spark's groupByKey operator with something with bounded memory
> ----------------------------------------------------------------------
>
>                 Key: HIVE-15580
>                 URL: https://issues.apache.org/jira/browse/HIVE-15580
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-15580.1.patch, HIVE-15580.1.patch, 
> HIVE-15580.2.patch, HIVE-15580.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15580) Replace Spark's groupByKey operator with something with bounded memory

Reply via email to