[jira] [Commented] (PIG-4893) Task deserialization time is too long for spark on yarn mode

Rohini Palaniswamy (JIRA) Wed, 01 Jun 2016 13:08:04 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311013#comment-15311013
 ]


Rohini Palaniswamy commented on PIG-4893:
-----------------------------------------

You should have the spark jars in global hdfs location (similar to Mapreduce 
and Tez tar balls) and reference that instead of shipping every time. This will 
ensure it is downloaded only once to a node how many times different users run 
scripts.

You should not be shipping everything under lib directory. Refer to the 
Mapreduce and Tez distributed cache setup code. Only default essential jars - 
JarManager.getDefaultJars() are shipped.  jython jar and jruby jar are added by 
those ScriptEngines if they are part of the script. Rest come from Mapreduce 
(mapreduce.application.framework.path) and Tez (tez.lib.uris) tar balls in hdfs.

> Task deserialization time is too long for spark on yarn mode
> ------------------------------------------------------------
>
>                 Key: PIG-4893
>                 URL: https://issues.apache.org/jira/browse/PIG-4893
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: time.PNG
>
>
> I found the task deserialization time is a bit long when i run any scripts of 
> pigmix in spark on yarn mode.  see the attachment picture.  The duration time 
> is 3s but the task deserialization is 13s.  
> My env is hadoop2.6+spark1.6.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PIG-4893) Task deserialization time is too long for spark on yarn mode

Reply via email to