We are trying to run a job that has previously run on Spark 1.3 on a different cluster. The job was converted to 2.3 spark and this is a new cluster.
The job dies after completing about a half dozen stages with java.io.IOException: No space left on device It appears that the nodes are using local storage as tmp. I could use help diagnosing the issue and how to fix it. Here are the spark conf properties Spark Conf Properties spark.driver.extraJavaOptions=-Djava.io.tmpdir=/scratch/home/int/eva/zorzan/sparktmp/ spark.master=spark://10.141.0.34:7077 spark.mesos.executor.memoryOverhead=3128 spark.shuffle.consolidateFiles=true spark.shuffle.spill=falsespark.app.name=Anonymous spark.shuffle.manager=sort spark.storage.memoryFraction=0.3 spark.jars=file:/home/int/eva/zorzan/bin/SparkHydraV2-master/HydraSparkBuilt.jar spark.ui.killEnabled=true spark.shuffle.spill.compress=true spark.shuffle.sort.bypassMergeThreshold=100 com.lordjoe.distributed.marker_property=spark_property_set spark.executor.memory=12g spark.mesos.coarse=true spark.shuffle.memoryFraction=0.4 spark.serializer=org.apache.spark.serializer.KryoSerializer spark.kryo.registrator=com.lordjoe.distributed.hydra.HydraKryoSerializer spark.default.parallelism=360 spark.io.compression.codec=lz4 spark.reducer.maxMbInFlight=128 spark.hadoop.validateOutputSpecs=false spark.submit.deployMode=client spark.local.dir=/scratch/home/int/eva/zorzan/sparktmp spark.shuffle.file.buffer.kb=1024 -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com