Frank Rosner created ZEPPELIN-2258: -------------------------------------- Summary: Spark driver not closed on interpreter restart Key: ZEPPELIN-2258 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2258 Project: Zeppelin Issue Type: Bug Components: Interpreters Affects Versions: 0.7.0 Environment: Spark Standalone 2.0.2 Reporter: Frank Rosner
h4. Problem Sometimes we experience an issue that when restarting the Spark interpreter, the driver is not shutdown correctly. Due to this, the applications stays submitted to the (standalone) cluster and still consumes resources. Looking at the logs there is not much to see, at least no error message. How does Zeppelin terminate the Spark session when restarting the interpreter? {noformat} grep "SparkUI" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log INFO [2017-03-13 14:01:06,415] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42935. INFO [2017-03-13 14:01:06,418] ({pool-2-thread-3} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42935 WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936. INFO [2017-03-13 17:27:58,731] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'SparkUI' on port 42936. INFO [2017-03-13 17:27:58,737] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Bound SparkUI to 0.0.0.0, and started at http://xxxxxxxxxxx:42936 {noformat} {noformat} grep -B 300 "could not bind on port 42935" zeppelin-interpreter-spark-zeppelin-zeppelin-02.log INFO [2017-03-13 17:26:19,998] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:26:20,000] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:26:22,192] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:26:22,193] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:26:22,397] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:26:24,604] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:27:44,299] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:27:44,344] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:27:44,347] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} RemoteInterpreterServer.java[cancel]:535) - cancel org.apache.zeppelin.spark.PySparkInterpreter 20170313-171538_890555184 INFO [2017-03-13 17:27:44,348] ({pool-1-thread-1} Logging.scala[logInfo]:54) - Asked to cancel job group zeppelin-2CB7T91AF-20170313-171538_890555184 INFO [2017-03-13 17:27:52,196] ({Thread-0} RemoteInterpreterServer.java[run]:96) - Starting remote interpreter server on port 47609 INFO [2017-03-13 17:27:52,561] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter INFO [2017-03-13 17:27:52,617] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter INFO [2017-03-13 17:27:52,623] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter INFO [2017-03-13 17:27:52,626] ({pool-1-thread-2} RemoteInterpreterServer.java[createInterpreter]:196) - Instantiate interpreter org.apache.zeppelin.spark.DepInterpreter INFO [2017-03-13 17:27:52,673] ({pool-2-thread-2} SchedulerFactory.java[jobStarted]:131) - Job remoteInterpretJob_1489422472669 started by scheduler interpreter_1754241267 INFO [2017-03-13 17:27:52,682] ({pool-2-thread-2} PySparkInterpreter.java[createPythonScript]:106) - File /tmp/zeppelin_pyspark-142944822224657232.py created INFO [2017-03-13 17:27:56,198] ({pool-2-thread-75} SchedulerFactory.java[jobFinished]:137) - Job remoteInterpretJob_1489422348563 finished by scheduler interpreter_999576780 INFO [2017-03-13 17:27:56,583] ({pool-2-thread-2} SparkInterpreter.java[createSparkSession]:350) - ------ Create new SparkContext spark://yyyyyyyyyyyyy:7077 ------- INFO [2017-03-13 17:27:56,638] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Running Spark version 2.0.2 WARN [2017-03-13 17:27:57,427] ({pool-2-thread-2} NativeCodeLoader.java[<clinit>]:62) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable INFO [2017-03-13 17:27:57,664] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls to: zeppelin INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls to: zeppelin INFO [2017-03-13 17:27:57,665] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing view acls groups to: INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Changing modify acls groups to: INFO [2017-03-13 17:27:57,666] ({pool-2-thread-2} Logging.scala[logInfo]:54) - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zeppelin); groups with view permissions: Set(); users with modify permissions: Set(zeppelin); groups with modify permissions: Set() INFO [2017-03-13 17:27:58,148] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Successfully started service 'sparkDriver' on port 56861. INFO [2017-03-13 17:27:58,179] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering MapOutputTracker INFO [2017-03-13 17:27:58,212] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering BlockManagerMaster INFO [2017-03-13 17:27:58,234] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Created local directory at /tmp/blockmgr-5dd56028-ee7c-4996-9011-245dcc7f3d6b INFO [2017-03-13 17:27:58,254] ({pool-2-thread-2} Logging.scala[logInfo]:54) - MemoryStore started with capacity 4.1 GB INFO [2017-03-13 17:27:58,324] ({pool-2-thread-2} Logging.scala[logInfo]:54) - Registering OutputCommitCoordinator INFO [2017-03-13 17:27:58,487] ({pool-2-thread-2} Log.java[initialized]:186) - Logging initialized @7223ms INFO [2017-03-13 17:27:58,630] ({pool-2-thread-2} Server.java[doStart]:327) - jetty-9.2.z-SNAPSHOT INFO [2017-03-13 17:27:58,656] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@4a482724{/jobs,null,AVAILABLE} INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,657] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,AVAILABLE} INFO [2017-03-13 17:27:58,658] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9f092b{/stages,null,AVAILABLE} INFO [2017-03-13 17:27:58,659] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,660] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,AVAILABLE} INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,661] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,AVAILABLE} INFO [2017-03-13 17:27:58,662] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@74d9c119{/storage,null,AVAILABLE} INFO [2017-03-13 17:27:58,663] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,AVAILABLE} INFO [2017-03-13 17:27:58,664] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,665] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,AVAILABLE} INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,666] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@704ce48{/executors,null,AVAILABLE} INFO [2017-03-13 17:27:58,667] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,668] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,AVAILABLE} INFO [2017-03-13 17:27:58,669] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,AVAILABLE} INFO [2017-03-13 17:27:58,676] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@3224456e{/static,null,AVAILABLE} INFO [2017-03-13 17:27:58,677] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@9eea220{/,null,AVAILABLE} INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@70743df{/api,null,AVAILABLE} INFO [2017-03-13 17:27:58,679] ({pool-2-thread-2} ContextHandler.java[doStart]:744) - Started o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,AVAILABLE} WARN [2017-03-13 17:27:58,682] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935}: java.net.BindException: Address already in use WARN [2017-03-13 17:27:58,686] ({pool-2-thread-2} AbstractLifeCycle.java[setFailed]:212) - FAILED org.spark_project.jetty.server.Server@26ba9b5f: java.net.BindException: Address already in use INFO [2017-03-13 17:27:58,689] ({pool-2-thread-2} AbstractConnector.java[doStop]:306) - Stopped ServerConnector@3a4224e0{HTTP/1.1}{0.0.0.0:42935} INFO [2017-03-13 17:27:58,691] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@984c1f6{/stages/stage/kill,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@70743df{/api,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9eea220{/,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3224456e{/static,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,692] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@13f3afd6{/executors/threadDump/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@631daf45{/executors/threadDump,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@35f1c01a{/executors/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@704ce48{/executors,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,693] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@380902e2{/environment/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@6ca753a2{/environment,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@56cc816b{/storage/rdd/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@78f8f942{/storage/rdd,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,694] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@19c9a662{/storage/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@74d9c119{/storage,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@bb4a5cb{/stages/pool/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@40f42495{/stages/pool,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,695] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@38dab4f4{/stages/stage/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@186cefd3{/stages/stage,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@279d92c2{/stages/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,696] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@9f092b{/stages,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@b656e64{/jobs/job/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f300fc5{/jobs/job,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@3f531ec4{/jobs/json,null,UNAVAILABLE} INFO [2017-03-13 17:27:58,697] ({pool-2-thread-2} ContextHandler.java[doStop]:865) - Stopped o.s.j.s.ServletContextHandler@4a482724{/jobs,null,UNAVAILABLE} WARN [2017-03-13 17:27:58,704] ({pool-2-thread-2} Logging.scala[logWarning]:66) - Service 'SparkUI' could not bind on port 42935. Attempting port 42936. {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)