[jira] [Commented] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr

Xuefu Zhang (JIRA) Thu, 13 Jul 2017 00:54:20 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16085324#comment-16085324
 ]


Xuefu Zhang commented on HIVE-15104:
------------------------------------

[~lirui], I'm wondering if there is anything new (other than moving code 
around). Last time we benchmarked and found there was actual performance 
degradation. We can do that again, and if the perf degradation still exists, we 
may not want this at lest not by default. I wasn't able to figure out why this 
degradation might happen.

> Hive on Spark generate more shuffle data than hive on mr
> --------------------------------------------------------
>
>                 Key: HIVE-15104
>                 URL: https://issues.apache.org/jira/browse/HIVE-15104
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.2.1
>            Reporter: wangwenli
>            Assignee: Rui Li
>         Attachments: HIVE-15104.1.patch, HIVE-15104.2.patch, 
> HIVE-15104.3.patch, HIVE-15104.4.patch, TPC-H 100G.xlsx
>
>
> the same sql,  running on spark  and mr engine, will generate different size 
> of shuffle data.
> i think it is because of hive on mr just serialize part of HiveKey, but hive 
> on spark which using kryo will serialize full of Hivekey object.  
> what is your opionion?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr

Reply via email to