I'm running Spark on Mesos in coarse grain mode and experiencing some serious
issues when trying to run an application on spark 1.6.1 we have no issues
running the same app on spark 1.5.1 (we're trying to migrate to 1.6.1)

  

I'm running the mesos-external-shuffle service on all my slaves.

  

My command args look like the following:

  

/opt/spark-1.6.1/bin/spark-submit --master "mesos://zk://prod-zookeeper-1:2181
,prod-zookeeper-2:2181,prod-zookeeper-3:2181/mesos"

\--conf spark.ui.port=31232 \

 \--conf spark.mesos.coarse=true \

\--conf spark.mesos.constraints="rack:spark" \

\--conf spark.shuffle.service.enabled=true \

\--conf spark.dynamicAllocation.enabled=false \

\--conf spark.mesos.executor.memoryOverhead=4500 \

\--conf spark.shuffle.io.connectionTimeout=3600s \

\--class com.orchard.dataloader.library.originators.prosper.LoadTrade_Prosper
\

\--total-executor-cores 48 \

\--driver-memory 14G \

\--executor-memory 15G \

\--jars config.jar  target/scala-2.11/dataloader-library-
a65139092664f386c317b3e5908bf009015477a2-assembled.jar  

  

After starting the job after about 30 or so minutes during the final 3 stages
I start to see the following exceptions thrown:

  

Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Executor is
not registered (appId=f3c06869-9e5f-429c-bfc3-713c8f475064-34075,
execId=f3c06869-9e5f-429c-bfc3-713c8f475064-S20)  

  

6/04/07 04:01:06 INFO DAGScheduler: Job 2 failed: first at Table.scala:49,
took 966.989764 s  
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to
stage failure: ShuffleMapStage 11 (first at Table.scala:49) has failed the
maximum allowable number of times: 4. Most recent failure reason:
org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException:
Executor is not registered (appId=f3c06869-9e5f-429c-bfc3-713c8f475064-34075,
execId=f3c06869-9e5f-429c-bfc3-713c8f475064-S20)  

  

I've tried modifying a lot of the default settings thinking the executors are
being lost because of GC timeouts and heart beats etc..

  

The following job was ran with the following defaults which still made no
difference

spark.executor.extraJavaOptions    -Duser.timezone=UTC  
spark.driver.extraJavaOptions          -Duser.timezone=UTC  
spark.akka.timeout                 300s  
spark.network.timeout               300s  
spark.core.connection.ack.wait.timeout  300s  
spark.executor.heartbeatInterval   300s  
spark.files.fetchTimeout           120s  
spark.shuffle.service.port       31338  
spark.shuffle.compress             true  
spark.shuffle.file.buffer           128k  
spark.shuffle.io.maxRetries         5  
spark.shuffle.io.numConnectionsPerPeer  3  
spark.shuffle.service.enabled       true  
spark.files.fetchTimeout           120s  
spark.akka.timeout                 250s  
spark.dynamicAllocation.enabled     true  

  

Anyone other suggestions at this point? I'm not sure what else to do at this
point.

  

\--

**Rodrick Brown** / Senior Systems Engineer 

+1 917 445 6839 /
[rodr...@orchardplatform.com](mailto:char...@orchardplatform.com)

**Orchard Platform** 

101 5th Avenue, 4th Floor, New York, NY 10003

[http://www.orchardplatform.com](http://www.orchardplatform.com/)

[Orchard Blog](http://www.orchardplatform.com/blog/) | [Marketplace Lending
Meetup](http://www.meetup.com/Peer-to-Peer-Lending-P2P/)


-- 
*NOTICE TO RECIPIENTS*: This communication is confidential and intended for 
the use of the addressee only. If you are not an intended recipient of this 
communication, please delete it immediately and notify the sender by return 
email. Unauthorized reading, dissemination, distribution or copying of this 
communication is prohibited. This communication does not constitute an 
offer to sell or a solicitation of an indication of interest to purchase 
any loan, security or any other financial product or instrument, nor is it 
an offer to sell or a solicitation of an indication of interest to purchase 
any products or services to any persons who are prohibited from receiving 
such information under applicable law. The contents of this communication 
may not be accurate or complete and are subject to change without notice. 
As such, Orchard App, Inc. (including its subsidiaries and affiliates, 
"Orchard") makes no representation regarding the accuracy or completeness 
of the information contained herein. The intended recipient is advised to 
consult its own professional advisors, including those specializing in 
legal, tax and accounting matters. Orchard does not provide legal, tax or 
accounting advice.

Reply via email to