[
https://issues.apache.org/jira/browse/SPARK-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342181#comment-14342181
]
nigel commented on SPARK-1881:
------------------------------
Hello;
This was a query for Mesos, not Yarn. I haven't used Yarn, though will shortly.
In fact, I haven't really used spark for a while.
As far as I remember, this was due to be implemented, but wasn't at the time. I
don't know the current status.
Regards
> Executor caching
> ----------------
>
> Key: SPARK-1881
> URL: https://issues.apache.org/jira/browse/SPARK-1881
> Project: Spark
> Issue Type: Improvement
> Components: Mesos
> Affects Versions: 1.0.0
> Environment: centos 6.5, mesos 0.18.1
> Reporter: nigel
> Priority: Minor
>
> The problem is that the executor is copied for each run. We have a cluster
> where the disks are of moderate size and each executor is nearly 170MB. This
> executor is slow to copy and multiple runs take up a significant amount of
> space.
> The improvement would be to make it smaller.
> Currently the examples are included in there, which are not needed for
> execution. It is easy to take them out, but it might be better to not include
> them in the default build.
> Another improvement might be to cache the executor jar. The script below will
> make a 'sparklite' executor which only downloads the jar file once (until the
> tmp dir is wiped). The scripts (small) are downloaded each time as before.
> This example would need more work, the source and dest are currently
> hard-coded and it might be a good idea to check file dates and or checksums
> in case someone was uploading jars with the same version.
> This might be a bit redundant, depending on what happens with other work on
> executor caching.
> Comments welcome.
> --------------------------
> mkdir sparklite
> echo '58c58
> < if [ -f "$FWDIR/RELEASE" ]; then
> ---
> > if [ -f "$FWDIR/RELEASE" ] && [ -f
> > "$FWDIR"/lib/spark-assembly*hadoop*.jar ]; then
> 60c60
> < else
> ---
> > elif [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*.jar ]; then
> 61a62,68
> > else
> > #Try the local one. If not there, download from hdfs
> > if [ ! -f /tmp/sparklite/spark-assembly*hadoop*.jar ]; then
> > mkdir /tmp/sparklite 2>/dev/null
> > hdfs dfs -get /spark/spark-assembly*-hadoop*.jar /tmp/sparklite/
> > fi
> > ASSEMBLY_JAR=$(ls /tmp/sparklite/spark-assembly*hadoop*.jar 2>/dev/null)
> 64a72
> > ' > cc.patch
> tar -C sparklite -xf spark-1.0.0.tgz
> cd sparklite
> hdfs dfs -put ./spark-1.0.0/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop2.4.0.jar
> /spark/
> rm -f spark-1.0.0/lib/*assembly*
> rm -f spark-1.0.0/lib/*example*
> rm -f spark-1.0.0/bin/*.cmd
> rm -rf spark-1.0.0/ec2
> rm -rf spark-1.0.0/lib
> rm -rf spark-1.0.0/conf
> rm -rf spark-1.0.0/examples
> patch spark-1.0.0/bin/compute-classpath.sh < cc.patch
> rm -f spark-1.0.0.tgz
> tar zcf spark-1.0.0.tgz spark-1.0.0
> hdfs dfs -rm /spark/spark-1.0.0.tgz
> hdfs dfs -put ./spark-1.0.0.tgz /spark/
> ------------------------
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]