[jira] [Commented] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr

Xuefu Zhang (JIRA) Fri, 12 May 2017 06:18:08 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-15104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16008107#comment-16008107
 ]


Xuefu Zhang commented on HIVE-15104:
------------------------------------

[~lirui], great progress! Thanks for keeping up the effort.

As to Kryo class relocation, I think Hive did that to avoid version difference 
between Spark and Hive. (git history might confirm this.) I'm concerned that 
class conflicts might come back if we stop relocating Kryo. Any thoughts?

> Hive on Spark generate more shuffle data than hive on mr
> --------------------------------------------------------
>
>                 Key: HIVE-15104
>                 URL: https://issues.apache.org/jira/browse/HIVE-15104
>             Project: Hive
>          Issue Type: Bug
>          Components: Spark
>    Affects Versions: 1.2.1
>            Reporter: wangwenli
>            Assignee: Rui Li
>         Attachments: HIVE-15104.1.patch
>
>
> the same sql,  running on spark  and mr engine, will generate different size 
> of shuffle data.
> i think it is because of hive on mr just serialize part of HiveKey, but hive 
> on spark which using kryo will serialize full of Hivekey object.  
> what is your opionion?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-15104) Hive on Spark generate more shuffle data than hive on mr

Reply via email to