[jira] [Commented] (SPARK-1881) Executor caching

nigel (JIRA) Sun, 01 Mar 2015 04:05:34 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342181#comment-14342181
 ]


nigel commented on SPARK-1881:
------------------------------

Hello;
This was a query for Mesos, not Yarn. I haven't used Yarn, though will shortly. 
In fact, I haven't really used spark for a while. 
As far as I remember, this was due to be implemented, but wasn't at the time. I 
don't know the current status.
Regards


> Executor caching
> ----------------
>
>                 Key: SPARK-1881
>                 URL: https://issues.apache.org/jira/browse/SPARK-1881
>             Project: Spark
>          Issue Type: Improvement
>          Components: Mesos
>    Affects Versions: 1.0.0
>         Environment: centos 6.5, mesos 0.18.1
>            Reporter: nigel
>            Priority: Minor
>
> The problem is that the executor is copied for each run. We have a cluster 
> where the disks are of moderate size and each executor is nearly 170MB. This 
> executor is slow to copy and multiple runs take up a significant amount of 
> space.
> The improvement would be to make it smaller.
> Currently the examples are included in there, which are not needed for 
> execution. It is easy to take them out, but it might be better to not include 
> them in the default build.
> Another improvement might be to cache the executor jar. The script below will 
> make a 'sparklite' executor which only downloads the jar file once (until the 
> tmp dir is wiped). The scripts (small) are downloaded each time as before.
> This example would need more work, the source and dest are currently 
> hard-coded and it might be a good idea to check file dates and or checksums 
> in case someone was uploading jars with the same version.
> This might be a bit redundant, depending on what happens with other work on 
> executor caching.
> Comments welcome.
> --------------------------
> mkdir sparklite
> echo '58c58
> <   if [ -f "$FWDIR/RELEASE" ]; then
> ---
> >   if [ -f "$FWDIR/RELEASE" ] && [ -f 
> > "$FWDIR"/lib/spark-assembly*hadoop*.jar ]; then
> 60c60
> <   else
> ---
> >   elif [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*.jar ]; then
> 61a62,68
> >   else
> > #Try the local one. If not there, download from hdfs
> >     if [ ! -f /tmp/sparklite/spark-assembly*hadoop*.jar ]; then
> >         mkdir /tmp/sparklite 2>/dev/null
> >         hdfs dfs -get /spark/spark-assembly*-hadoop*.jar /tmp/sparklite/
> >     fi    
> >     ASSEMBLY_JAR=$(ls /tmp/sparklite/spark-assembly*hadoop*.jar 2>/dev/null)
> 64a72
> > ' > cc.patch
> tar -C sparklite -xf spark-1.0.0.tgz 
> cd sparklite
> hdfs dfs -put ./spark-1.0.0/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop2.4.0.jar 
> /spark/
> rm -f spark-1.0.0/lib/*assembly*
> rm -f spark-1.0.0/lib/*example*
> rm -f spark-1.0.0/bin/*.cmd
> rm -rf spark-1.0.0/ec2
> rm -rf spark-1.0.0/lib
> rm -rf spark-1.0.0/conf
> rm -rf spark-1.0.0/examples
> patch spark-1.0.0/bin/compute-classpath.sh < cc.patch
> rm -f spark-1.0.0.tgz
> tar zcf spark-1.0.0.tgz spark-1.0.0
> hdfs dfs -rm /spark/spark-1.0.0.tgz
> hdfs dfs -put ./spark-1.0.0.tgz /spark/
> ------------------------



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1881) Executor caching

Reply via email to