[
https://issues.apache.org/jira/browse/PIG-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334954#comment-15334954
]
liyunzhang_intel commented on PIG-4893:
---------------------------------------
[~pallavi.rao]: before we append all jars under lib/Pig and lib/Spark/Pig/ to
the classpath ,SPARK_YARN_DIST_FILES and SPARK_DIST_CLASSPATH, If we add these
jars to SPARK_YARN_DIST_FILES, yarn container will download these jars when job
starts and task deserialization includes the download jars time. In PIG-4903,
we don't append all jars to SPARK_YARN_DIST_FILES any more, the solution to
solve the needed jar is dynamically loading jars(call SparkContext#addFile) in
SparkLauncher(what i did in this jira).
> Task deserialization time is too long for spark on yarn mode
> ------------------------------------------------------------
>
> Key: PIG-4893
> URL: https://issues.apache.org/jira/browse/PIG-4893
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Assignee: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: PIG-4893.patch, time.PNG
>
>
> I found the task deserialization time is a bit long when i run any scripts of
> pigmix in spark on yarn mode. see the attachment picture. The duration time
> is 3s but the task deserialization is 13s.
> My env is hadoop2.6+spark1.6.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)