[
https://issues.apache.org/jira/browse/PIG-4893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311013#comment-15311013
]
Rohini Palaniswamy commented on PIG-4893:
-----------------------------------------
You should have the spark jars in global hdfs location (similar to Mapreduce
and Tez tar balls) and reference that instead of shipping every time. This will
ensure it is downloaded only once to a node how many times different users run
scripts.
You should not be shipping everything under lib directory. Refer to the
Mapreduce and Tez distributed cache setup code. Only default essential jars -
JarManager.getDefaultJars() are shipped. jython jar and jruby jar are added by
those ScriptEngines if they are part of the script. Rest come from Mapreduce
(mapreduce.application.framework.path) and Tez (tez.lib.uris) tar balls in hdfs.
> Task deserialization time is too long for spark on yarn mode
> ------------------------------------------------------------
>
> Key: PIG-4893
> URL: https://issues.apache.org/jira/browse/PIG-4893
> Project: Pig
> Issue Type: Sub-task
> Components: spark
> Reporter: liyunzhang_intel
> Fix For: spark-branch
>
> Attachments: time.PNG
>
>
> I found the task deserialization time is a bit long when i run any scripts of
> pigmix in spark on yarn mode. see the attachment picture. The duration time
> is 3s but the task deserialization is 13s.
> My env is hadoop2.6+spark1.6.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)