[ https://issues.apache.org/jira/browse/SPARK-1881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342181#comment-14342181 ]
nigel commented on SPARK-1881: ------------------------------ Hello; This was a query for Mesos, not Yarn. I haven't used Yarn, though will shortly. In fact, I haven't really used spark for a while. As far as I remember, this was due to be implemented, but wasn't at the time. I don't know the current status. Regards > Executor caching > ---------------- > > Key: SPARK-1881 > URL: https://issues.apache.org/jira/browse/SPARK-1881 > Project: Spark > Issue Type: Improvement > Components: Mesos > Affects Versions: 1.0.0 > Environment: centos 6.5, mesos 0.18.1 > Reporter: nigel > Priority: Minor > > The problem is that the executor is copied for each run. We have a cluster > where the disks are of moderate size and each executor is nearly 170MB. This > executor is slow to copy and multiple runs take up a significant amount of > space. > The improvement would be to make it smaller. > Currently the examples are included in there, which are not needed for > execution. It is easy to take them out, but it might be better to not include > them in the default build. > Another improvement might be to cache the executor jar. The script below will > make a 'sparklite' executor which only downloads the jar file once (until the > tmp dir is wiped). The scripts (small) are downloaded each time as before. > This example would need more work, the source and dest are currently > hard-coded and it might be a good idea to check file dates and or checksums > in case someone was uploading jars with the same version. > This might be a bit redundant, depending on what happens with other work on > executor caching. > Comments welcome. > -------------------------- > mkdir sparklite > echo '58c58 > < if [ -f "$FWDIR/RELEASE" ]; then > --- > > if [ -f "$FWDIR/RELEASE" ] && [ -f > > "$FWDIR"/lib/spark-assembly*hadoop*.jar ]; then > 60c60 > < else > --- > > elif [ -f "$ASSEMBLY_DIR"/spark-assembly*hadoop*.jar ]; then > 61a62,68 > > else > > #Try the local one. If not there, download from hdfs > > if [ ! -f /tmp/sparklite/spark-assembly*hadoop*.jar ]; then > > mkdir /tmp/sparklite 2>/dev/null > > hdfs dfs -get /spark/spark-assembly*-hadoop*.jar /tmp/sparklite/ > > fi > > ASSEMBLY_JAR=$(ls /tmp/sparklite/spark-assembly*hadoop*.jar 2>/dev/null) > 64a72 > > ' > cc.patch > tar -C sparklite -xf spark-1.0.0.tgz > cd sparklite > hdfs dfs -put ./spark-1.0.0/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop2.4.0.jar > /spark/ > rm -f spark-1.0.0/lib/*assembly* > rm -f spark-1.0.0/lib/*example* > rm -f spark-1.0.0/bin/*.cmd > rm -rf spark-1.0.0/ec2 > rm -rf spark-1.0.0/lib > rm -rf spark-1.0.0/conf > rm -rf spark-1.0.0/examples > patch spark-1.0.0/bin/compute-classpath.sh < cc.patch > rm -f spark-1.0.0.tgz > tar zcf spark-1.0.0.tgz spark-1.0.0 > hdfs dfs -rm /spark/spark-1.0.0.tgz > hdfs dfs -put ./spark-1.0.0.tgz /spark/ > ------------------------ -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org