Hi there 
      I am running 30 APPs in my spark cluster, and some of the APPs got 
exception like below:[root@slave3 0]# cat stderr
15/06/29 17:20:08 INFO executor.CoarseGrainedExecutorBackend: Registered signal 
handlers for [TERM, HUP, INT]
15/06/29 17:20:09 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
15/06/29 17:20:09 INFO spark.SecurityManager: Changing view acls to: root
15/06/29 17:20:09 INFO spark.SecurityManager: Changing modify acls to: root
15/06/29 17:20:09 INFO spark.SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); users with 
modify permissions: Set(root)
15/06/29 17:20:09 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/06/29 17:20:09 INFO Remoting: Starting remoting
15/06/29 17:20:10 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://driverPropsFetcher@slave3:51026]
15/06/29 17:20:10 INFO util.Utils: Successfully started service 
'driverPropsFetcher' on port 51026.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1643)
        at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:59)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:128)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:224)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 
seconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:144)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
        at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:59)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        ... 4 more

when i am running 20 APPs,it is OK. So I doubt this problem looks like executor 
get disassicated with the driver due to high I/O pressure or network 
latency.however I have no idea which parameter is spark could fix this. Any 
idea will be appreciated.

Here is some infomation about my cluster:1master and 6workers.every node has 
8cores and 12GB memory.
And settings in my spark-default.conf and spark-env.sh is like this:
spark-default.conf
spark.master                     spark://master:7077
spark.eventLog.enabled           true
spark.eventLog.dir               /var/log/spark
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              8g
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one 
two three"
spark.kryoserializer.buffer.max.mb    128
spark.storage.memoryFraction     0.2
spark.shuffle.memoryFraction     0.4
spark.sql.shuffle.partitions     32
spark.scheduler.mode             FAIR
spark.worker.cleanup.appDataTtl  259200
spark.port.maxRetries            10000
spark.scheduler.maxRegisteredResourcesWaitingTime   40
spark-env.sh:export SPARK_WORKER_INSTANCES=1
export SPARK_EXECUTOR_INSTANCES=8
export SPARK_EXECUTOR_CORES=1
export SPARK_EXECUTOR_MEMORY=1g


--------------------------------

 

Thanks&Best regards!
San.Luo

Reply via email to