[jira] [Commented] (SPARK-1881) Executor caching

Sean Owen (JIRA) Sun, 01 Mar 2015 03:50:19 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342163#comment-14342163
 ]


Sean Owen commented on SPARK-1881:
----------------------------------

Are you asking for something like {{spark.yarn.jar}} for Mesos? it already 
implements the caching you describe.
I don't think you'd want to modify the assembly JAR this way, but caching is 
certainly possible.
I haven't kept up with Mesos so maybe this is already somehow implemented; I 
don't know. I don't see {{spark.mesos.jar}}

> Executor caching
> ----------------
>
>                 Key: SPARK-1881
>                 URL: https://issues.apache.org/jira/browse/SPARK-1881
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos
>    Affects Versions: 1.0.0
>         Environment: centos 6.5, mesos 0.18.1
>            Reporter: nigel
>            Priority: Minor
>
> The problem is that the executor is copied for each run. We have a cluster 
> where the disks are of moderate size and each executor is nearly 170MB. This 
> executor is slow to copy and multiple runs take up a significant amount of 
> space.
> The improvement would be to make it smaller.
> Currently the examples are included in there, which are not needed for 
> execution. It is easy to take them out, but it might be better to not include 
> them in the default build.
> Another improvement might be to cache the executor jar. The script below will 
> make a 'sparklite' executor which only downloads the jar file once (until the 
> tmp dir is wiped). The scripts (small) are downloaded each time as before.
> This example would need more work, the source and dest are currently 
> hard-coded and it might be a good idea to check file dates and or checksums 
> in case someone was uploading jars with the same version.
> This might be a bit redundant, depending on what happens with other work on 
> executor caching.
> Comments welcome.
> --------------------------
> mkdir sparklite
> echo '58c58
> <   if [ -f "$FWDIR/RELEASE" ]; then
> ---
> >   if [ -f "$FWDIR/RELEASE" ] && [ -f 
> > "$FWDIR"/lib/spark-assembly*hadoop*.jar ]; then
> 60c60
> <   else
> ---
> >   elif [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*.jar ]; then
> 61a62,68
> >   else
> > #Try the local one. If not there, download from hdfs
> >     if [ ! -f /tmp/sparklite/spark-assembly*hadoop*.jar ]; then
> >         mkdir /tmp/sparklite 2>/dev/null
> >         hdfs dfs -get /spark/spark-assembly*-hadoop*.jar /tmp/sparklite/
> >     fi    
> >     ASSEMBLY_JAR=$(ls /tmp/sparklite/spark-assembly*hadoop*.jar 2>/dev/null)
> 64a72
> > ' > cc.patch
> tar -C sparklite -xf spark-1.0.0.tgz 
> cd sparklite
> hdfs dfs -put ./spark-1.0.0/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop2.4.0.jar 
> /spark/
> rm -f spark-1.0.0/lib/*assembly*
> rm -f spark-1.0.0/lib/*example*
> rm -f spark-1.0.0/bin/*.cmd
> rm -rf spark-1.0.0/ec2
> rm -rf spark-1.0.0/lib
> rm -rf spark-1.0.0/conf
> rm -rf spark-1.0.0/examples
> patch spark-1.0.0/bin/compute-classpath.sh < cc.patch
> rm -f spark-1.0.0.tgz
> tar zcf spark-1.0.0.tgz spark-1.0.0
> hdfs dfs -rm /spark/spark-1.0.0.tgz
> hdfs dfs -put ./spark-1.0.0.tgz /spark/
> ------------------------



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-1881) Executor caching

Reply via email to