[ 
https://issues.apache.org/jira/browse/PIG-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15334954#comment-15334954
 ] 

liyunzhang_intel commented on PIG-4893:
---------------------------------------

[~pallavi.rao]: before we append all jars under lib/Pig and lib/Spark/Pig/ to 
the classpath ,SPARK_YARN_DIST_FILES and SPARK_DIST_CLASSPATH, If we add these 
jars to SPARK_YARN_DIST_FILES, yarn container will download these jars when job 
starts and task deserialization includes the download jars time. In PIG-4903, 
we don't append all jars to SPARK_YARN_DIST_FILES any more, the solution to 
solve the needed jar is dynamically loading jars(call SparkContext#addFile) in 
SparkLauncher(what i did in this jira).

> Task deserialization time is too long for spark on yarn mode
> ------------------------------------------------------------
>
>                 Key: PIG-4893
>                 URL: https://issues.apache.org/jira/browse/PIG-4893
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4893.patch, time.PNG
>
>
> I found the task deserialization time is a bit long when i run any scripts of 
> pigmix in spark on yarn mode.  see the attachment picture.  The duration time 
> is 3s but the task deserialization is 13s.  
> My env is hadoop2.6+spark1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to