[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

Rui Li (JIRA) Wed, 25 Jul 2018 05:52:33 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-20032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16555466#comment-16555466
 ]


Rui Li commented on HIVE-20032:
-------------------------------

bq. I originally thought that --jars would add the specified jars to the 
executor and driver class path, but apparently thats not the case.
Is this a Spark issue? Because according to the 
[docs|https://github.com/apache/spark/blob/v2.3.0/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L525],
 the jars should be added to CP. It seems ApplicationMaster uses a [custom 
class 
loader|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala#L152]
 for the user class, which should load the jars added by {{--jars}}.
A possible cause is that the jars are usually [not added to system class 
loader|https://github.com/apache/spark/blob/v2.3.0/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1273].
 Sometimes that can give you ClassNotFoundException even when the jars are 
there -- you just need to use the correct class loader.

> Don't serialize hashCode for repartitionAndSortWithinPartitions
> ---------------------------------------------------------------
>
>                 Key: HIVE-20032
>                 URL: https://issues.apache.org/jira/browse/HIVE-20032
>             Project: Hive
>          Issue Type: Improvement
>          Components: Spark
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>            Priority: Major
>         Attachments: HIVE-20032.1.patch, HIVE-20032.2.patch, 
> HIVE-20032.3.patch, HIVE-20032.4.patch, HIVE-20032.5.patch, 
> HIVE-20032.6.patch, HIVE-20032.7.patch, HIVE-20032.8.patch, HIVE-20032.9.patch
>
>
> Follow up on HIVE-15104, if we don't enable RDD cacheing or groupByShuffles, 
> then we don't need to serialize the hashCode when shuffling data in HoS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-20032) Don't serialize hashCode for repartitionAndSortWithinPartitions

Reply via email to