[ https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sebastian YEPES FERNANDEZ updated SPARK-9503: --------------------------------------------- Description: Hello, I have just started using start-mesos-dispatcher and have been noticing that some random crashes NPE's By looking at the exception it looks like in certain situations the "queuedDrivers" is empty and causes the NPE "submission.cores" https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 {code:title=log|borderStyle=solid} 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on port 7077 Exception in thread "Thread-1647" java.lang.NullPointerException at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0000' 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED {code} A side effect of this NPE is that after the crash the dispatcher will not start because its already registered #SPARK-7831 {code:title=log|borderStyle=solid} 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at http://192.168.0.254:8081 I0731 09:55:47.715039 8162 sched.cpp:157] Version: 0.23.0 I0731 09:55:47.717013 8163 sched.cpp:254] New master detected at master@192.168.0.254:5050 I0731 09:55:47.717381 8163 sched.cpp:264] No credentials provided. Attempting to register without authentication I0731 09:55:47.718246 8177 sched.cpp:819] Got error 'Completed framework attempted to re-register' I0731 09:55:47.718268 8177 sched.cpp:1625] Asked to abort the driver 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed framework attempted to re-register I0731 09:55:47.719091 8177 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0038' 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED 15/07/31 09:55:47 INFO Utils: Shutdown hook called {code} I can get around this by removing the zk data: {code:title=zkCli.sh|borderStyle=solid} rmr /spark_mesos_dispatcher {code} was: Hello, I have just started using start-mesos-dispatcher and have been noticing that some random crashes NPE's By looking at the exception it looks like in certain situations the "queuedDrivers" is empty and causes the NPE "submission.cores" https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 {code:title=log|borderStyle=solid} 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on port 7077 Exception in thread "Thread-1647" java.lang.NullPointerException at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0000' 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED {code} A side effect of this NPE is that after the crash the dispatcher will not start because its already registered #SPARK-7831 I can get around this by removing the zk data: {code:title=zkCli.sh|borderStyle=solid} rmr /spark_mesos_dispatcher {code} > Mesos dispatcher NullPointerException (MesosClusterScheduler) > ------------------------------------------------------------- > > Key: SPARK-9503 > URL: https://issues.apache.org/jira/browse/SPARK-9503 > Project: Spark > Issue Type: Bug > Components: Mesos > Affects Versions: 1.4.1 > Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83 > Reporter: Sebastian YEPES FERNANDEZ > Labels: mesosphere > > Hello, > I have just started using start-mesos-dispatcher and have been noticing that > some random crashes NPE's > By looking at the exception it looks like in certain situations the > "queuedDrivers" is empty and causes the NPE "submission.cores" > https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516 > {code:title=log|borderStyle=solid} > 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting > applications on port 7077 > Exception in thread "Thread-1647" java.lang.NullPointerException > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436) > at > org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512) > I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver > I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework > '20150730-234528-4261456064-5050-61754-0000' > 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code > DRIVER_ABORTED > {code} > A side effect of this NPE is that after the crash the dispatcher will not > start because its already registered #SPARK-7831 > {code:title=log|borderStyle=solid} > 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at > http://192.168.0.254:8081 > I0731 09:55:47.715039 8162 sched.cpp:157] Version: 0.23.0 > I0731 09:55:47.717013 8163 sched.cpp:254] New master detected at > master@192.168.0.254:5050 > I0731 09:55:47.717381 8163 sched.cpp:264] No credentials provided. > Attempting to register without authentication > I0731 09:55:47.718246 8177 sched.cpp:819] Got error 'Completed framework > attempted to re-register' > I0731 09:55:47.718268 8177 sched.cpp:1625] Asked to abort the driver > 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed > framework attempted to re-register > I0731 09:55:47.719091 8177 sched.cpp:861] Aborting framework > '20150730-234528-4261456064-5050-61754-0038' > 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code > DRIVER_ABORTED > 15/07/31 09:55:47 INFO Utils: Shutdown hook called > {code} > I can get around this by removing the zk data: > {code:title=zkCli.sh|borderStyle=solid} > rmr /spark_mesos_dispatcher > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org