Philipp Dallig created ZEPPELIN-5897:
----------------------------------------
Summary: Spark-Interpreter context change
Key: ZEPPELIN-5897
URL: https://issues.apache.org/jira/browse/ZEPPELIN-5897
Project: Zeppelin
Issue Type: Bug
Components: spark
Reporter: Philipp Dallig
I have encountered some strange behaviour in the Spark interpreter. This
problem occurs when several cron jobs are started in parallel.
The launch command looks quite good.
{code:java}
[INFO] Interpreter launch command:
/opt/conda/lib/python3.9/site-packages/pyspark/bin/spark-submit --class
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer
--driver-class-path
/usr/share/java/*:/tmp/local-repo/spark_8g_8g/*:/opt/zeppelin/interpreter/spark/*:::/opt/zeppelin/interpreter/zeppelin-interpreter-shaded-0.11.0-SNAPSHOT.jar:/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar
--driver-java-options -Dfile.encoding=UTF-8
-Dlog4j.configuration=file:///opt/zeppelin/conf/log4j.properties
-Dlog4j.configurationFile=file:///opt/zeppelin/conf/log4j2.properties
-Dzeppelin.log.file=/opt/zeppelin/logs/zeppelin-interpreter-spark_8g_8g-isolated-2G8V2J18D-2023-04-11_00-00-00--spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren.log
--conf spark.driver.maxResultSize=8g --conf
spark.kubernetes.executor.request.cores=0. --conf spark.network.timeout=1800
--conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf
spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
--verbose --conf spark.jars.ivySettings=/opt/spark/ivysettings.xml --proxy-user
ejavaheri --conf spark.master=k8s://https://kubernetes.default.svc --conf
spark.driver.memory=8g --conf spark.driver.cores=2 --conf
spark.app.name=spark_8g_8g --conf
spark.driver.host=spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren.spark.svc
--conf spark.kubernetes.memoryOverheadFactor=0.4 --conf
spark.webui.yarn.useProxy=false --conf spark.blockManager.port=22322 --conf
spark.driver.port=22321 --conf spark.driver.bindAddress=0.0.0.0 --conf
spark.kubernetes.namespace=spark --conf
spark.kubernetes.driver.request.cores=200m --conf
spark.kubernetes.driver.pod.name=spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren
--conf spark.executor.instances=1 --conf spark.executor.memory=8g --conf
spark.executor.cores=4 --conf spark.submit.deployMode=client --conf
spark.kubernetes.container.image=harbor.mycompany.com/dap/zeppelin-executor:3.3
/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar
zeppelin-server.spark.svc 12320
spark_8g_8g-isolated-2G8V2J18D-2023-04-11_00-00-00 12321:12321{code}
As you can see the config value `spark.driver.host` is
`spark8g8g-isolated-2g8v2j18d-2023-04-1100-00-00-upuren.spark.svc`, which is
correct
During start-up, the host seems to change. New name:
{code:java}
spark2g4g-isolated-2d8reueys-2023-04-1100-00-00-fbvrgw.spark.svc {code}
The new name is the host name of the other parallel running cron job. How is it
possible that the spark driver host changes? Does Zeppelin even have the
possibility to do this?
{code:java}
INFO [2023-04-11 00:00:04,288] ({RegisterThread}
RemoteInterpreterServer.java[run]:620) - Start registration
INFO
[2023-04-11 00:00:04,288] ({RemoteInterpreterServer-Thread}
RemoteInterpreterServer.java[run]:200) - Launching ThriftServer at
10.129.4.191:12321
INFO [2023-04-11 00:00:05,409] ({RegisterThread}
RemoteInterpreterServer.java[run]:634) - Registering interpreter process
INFO [2023-04-11 00:00:05,433] ({RegisterThread}
RemoteInterpreterServer.java[run]:636) - Registered interpreter process
INFO [2023-04-11 00:00:05,433] ({RegisterThread}
RemoteInterpreterServer.java[run]:657) - Registration finished
WARN
[2023-04-11 00:00:05,517] ({pool-3-thread-1}
ZeppelinConfiguration.java[<init>]:87) - Failed to load XML
configuration, proceeding with a default,for a stacktrace activate the
debug log
INFO [2023-04-11 00:00:05,522] ({pool-3-thread-1}
ZeppelinConfiguration.java[create]:137) - Server Host: 127.0.0.1
INFO [2023-04-11 00:00:05,523] ({pool-3-thread-1}
ZeppelinConfiguration.java[create]:144) - Zeppelin Version: 0.11.0-SNAPSHOT
INFO [2023-04-11 00:00:05,522] ({pool-3-thread-1}
ZeppelinConfiguration.java[create]:141) - Server Port: 8080
INFO [2023-04-11 00:00:05,523] ({pool-3-thread-1}
ZeppelinConfiguration.java[create]:143) - Context Path: /
INFO
[2023-04-11 00:00:05,531] ({pool-3-thread-1}
RemoteInterpreterServer.java[createLifecycleManager]:293) - Creating
interpreter lifecycle manager:
org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager
INFO
[2023-04-11 00:00:05,535] ({pool-3-thread-1}
RemoteInterpreterServer.java[init]:236) - Creating
RemoteInterpreterEventClient with connection pool size: 100
INFO
[2023-04-11 00:00:05,535] ({pool-3-thread-1}
TimeoutLifecycleManager.java[onInterpreterProcessStarted]:73) -
Interpreter process: spark_8g_8g-isolated-2G8V2J18D-2023-04-11_00-00-00
is started
INFO [2023-04-11 00:00:05,535] ({pool-3-thread-1}
TimeoutLifecycleManager.java[<init>]:67) - TimeoutLifecycleManager
is started with checkInterval: 60000, timeoutThreshold: ¸3600000
INFO
[2023-04-11 00:00:05,627] ({pool-3-thread-1}
RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate
interpreter org.apache.zeppelin.spark.SparkInterpreter, isForceShutdown:
true
INFO [2023-04-11 00:00:05,635] ({pool-3-thread-1}
RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate
interpreter org.apache.zeppelin.spark.SparkSqlInterpreter,
isForceShutdown: true
INFO [2023-04-11 00:00:05,645]
({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) -
Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter,
isForceShutdown: true
INFO [2023-04-11 00:00:05,655]
({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) -
Instantiate interpreter org.apache.zeppelin.spark.IPySparkInterpreter,
isForceShutdown: true
INFO [2023-04-11 00:00:05,663]
({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) -
Instantiate interpreter org.apache.zeppelin.spark.SparkRInterpreter,
isForceShutdown: true
INFO [2023-04-11 00:00:05,670]
({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) -
Instantiate interpreter org.apache.zeppelin.spark.SparkIRInterpreter,
isForceShutdown: true
INFO [2023-04-11 00:00:05,679]
({pool-3-thread-1} RemoteInterpreterServer.java[createInterpreter]:406) -
Instantiate interpreter
org.apache.zeppelin.spark.SparkShinyInterpreter, isForceShutdown: true
INFO
[2023-04-11 00:00:05,753] ({pool-3-thread-1}
RemoteInterpreterServer.java[createInterpreter]:406) - Instantiate
interpreter org.apache.zeppelin.spark.KotlinSparkInterpreter,
isForceShutdown: true
INFO [2023-04-11 00:00:05,806]
({pool-3-thread-1} SchedulerFactory.java[createOrGetFIFOScheduler]:76) -
Create FIFOScheduler: interpreter_688737023
INFO [2023-04-11 00:00:05,806] ({pool-3-thread-1}
SchedulerFactory.java[<init>]:56) - Scheduler Thread Pool Size: 100
INFO
[2023-04-11 00:00:05,810]
({FIFOScheduler-interpreter_688737023-Worker-1}
AbstractScheduler.java[runJob]:127) - Job 20210622-101638_112853005
started by scheduler interpreter_688737023
INFO [2023-04-11
00:00:05,818] ({pool-3-thread-2}
SchedulerFactory.java[createOrGetFIFOScheduler]:76) - Create
FIFOScheduler: interpreter_839216362
INFO [2023-04-11 00:00:05,818]
({pool-3-thread-2}
SchedulerFactory.java[createOrGetParallelScheduler]:88) - Create
ParallelScheduler:
org.apache.zeppelin.spark.SparkSqlInterpreter1135593921 with
maxConcurrency: 10
INFO [2023-04-11 00:00:05,857]
({FIFOScheduler-interpreter_688737023-Worker-1}
SparkInterpreter.java[extractScalaVersion]:279) - Using Scala: version
2.12.15
INFO [2023-04-11 00:00:05,881]
({FIFOScheduler-interpreter_688737023-Worker-1}
SparkScala212Interpreter.scala[createSparkILoop]:182) - Scala shell repl
output dir: /tmp/spark16004603505225443508
INFO [2023-04-11
00:00:06,113] ({FIFOScheduler-interpreter_688737023-Worker-1}
SparkScala212Interpreter.scala[createSparkILoop]:191) - UserJars:
file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar:/opt/zeppelin/interpreter/spark/scala-2.12/spark-scala-2.12-0.11.0-SNAPSHOT.jar
INFO
[2023-04-11 00:00:11,260]
({FIFOScheduler-interpreter_688737023-Worker-1}
HiveConf.java[findConfigFile]:187) - Found configuration file
file:/opt/conda/lib/python3.9/site-packages/pyspark/conf/hive-site.xml
INFO
[2023-04-11 00:00:11,438]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Running Spark version 3.3.0
INFO
[2023-04-11 00:00:11,472]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - No custom resources configured for
spark.driver.
INFO [2023-04-11 00:00:11,472]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) -
==============================================================
INFO
[2023-04-11 00:00:11,471]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) -
==============================================================
INFO
[2023-04-11 00:00:11,473]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Submitted application: spark_8g_8g
INFO
[2023-04-11 00:00:11,500]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Default ResourceProfile created, executor
resources: Map(cores -> name: cores, amount: 4, script: , vendor: ,
memory -> name: memory, amount: 8192, script: , vendor: , offHeap
-> name: offHeap, amount: 0, script: , vendor: ), task resources:
Map(cpus -> name: cpus, amount: 1.0)
INFO [2023-04-11
00:00:11,512] ({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Limiting resource is cpus at 4 tasks per
executor
INFO [2023-04-11 00:00:11,515]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Added ResourceProfile id: 0
INFO
[2023-04-11 00:00:11,580]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Changing view acls to: zeppelin,ejavaheri
INFO
[2023-04-11 00:00:11,580]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Changing modify acls to: zeppelin,ejavaheri
INFO
[2023-04-11 00:00:11,581]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - SecurityManager: authentication disabled;
ui acls disabled; users with view permissions: Set(zeppelin,
ejavaheri); groups with view permissions: Set(); users with modify
permissions: Set(zeppelin, ejavaheri); groups with modify permissions:
Set()
INFO [2023-04-11 00:00:11,581]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Changing modify acls groups to:
INFO
[2023-04-11 00:00:11,581]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Changing view acls groups to:
INFO
[2023-04-11 00:00:11,852]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Successfully started service 'sparkDriver'
on port 22321.
INFO [2023-04-11 00:00:11,880]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Registering MapOutputTracker
INFO
[2023-04-11 00:00:11,912]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Registering BlockManagerMaster
INFO
[2023-04-11 00:00:11,946]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information
INFO [2023-04-11 00:00:11,947]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - BlockManagerMasterEndpoint up
INFO
[2023-04-11 00:00:11,950]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Registering BlockManagerMasterHeartbeat
INFO
[2023-04-11 00:00:11,975]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Created local directory at
/tmp/blockmgr-1903d257-be01-4cb7-954f-9a5c13ab0598
INFO [2023-04-11
00:00:11,993] ({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - MemoryStore started with capacity 4.6 GiB
INFO
[2023-04-11 00:00:12,010]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Registering OutputCommitCoordinator
INFO [2023-04-11 00:00:12,079] ({FIFOScheduler-interpreter_688737023-Worker-1}
Log.java[initialized]:170) - Logging initialized @9839ms to
org.sparkproject.jetty.util.log.Slf4jLog
INFO
[2023-04-11 00:00:12,193]
({FIFOScheduler-interpreter_688737023-Worker-1}
Server.java[doStart]:375) - jetty-9.4.46.v20220331; built:
2022-03-31T16:38:08.030Z; git: bc17a0369a11ecf40bb92c839b9ef0a8ac50ea18;
jvm 11.0.17+8-post-Ubuntu-1ubuntu220.04
INFO [2023-04-11 00:00:12,223] ({FIFOScheduler-interpreter_688737023-Worker-1}
Server.java[doStart]:415) - Started @9983ms
INFO
[2023-04-11 00:00:12,273]
({FIFOScheduler-interpreter_688737023-Worker-1}
AbstractConnector.java[doStart]:333) - Started
ServerConnector@325be8be{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
INFO
[2023-04-11 00:00:12,274]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Successfully started service 'SparkUI' on
port 4040.
INFO [2023-04-11 00:00:12,310]
({FIFOScheduler-interpreter_688737023-Worker-1}
ContextHandler.java[doStart]:921) - Started
o.s.j.s.ServletContextHandler@47745fce{/,null,AVAILABLE,@Spark}
INFO
[2023-04-11 00:00:12,342]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Added JAR
file:/opt/zeppelin/interpreter/spark/spark-interpreter-0.11.0-SNAPSHOT.jar
at
spark://spark2g4g-isolated-2d8reueys-2023-04-1100-00-00-fbvrgw.spark.svc:22321/jars/spark-interpreter-0.11.0-SNAPSHOT.jar
with timestamp 1681164011433
INFO
[2023-04-11 00:00:12,413]
({FIFOScheduler-interpreter_688737023-Worker-1}
Logging.scala[logInfo]:61) - Auto-configuring K8S client using current
context from users K8S config file {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)