[
https://issues.apache.org/jira/browse/LIVY-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760029#comment-16760029
]
Ruslan Dautkhanov edited comment on LIVY-541 at 2/4/19 5:11 PM:
----------------------------------------------------------------
It might be a one-liner change here
[https://github.com/apache/incubator-livy/blob/56c76bc2d4563593edce062a563603fe63e5a431/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L99]
Change
|builderProperties.getOrElseUpdate("spark.app.name", s"livy-session-$id")|
to
builderProperties.getOrElseUpdate("spark.app.name", apptag)
I don't thing there is a drawback for doing there..
A better approach might be to add a configuration knob
`livy.yarn.session.prefix` so you could specify prefixes for different Livy
servers and they wouldn't overlap.
was (Author: tagar):
It might be a one-liner change here
[https://github.com/apache/incubator-livy/blob/56c76bc2d4563593edce062a563603fe63e5a431/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSession.scala#L99]
Change
|builderProperties.getOrElseUpdate("spark.app.name", s"livy-session-$id")|
to
builderProperties.getOrElseUpdate(apptag)
I don't thing there is a drawback for doing there..
A better approach might be to add a configuration knob
`livy.yarn.session.prefix` so you could specify prefixes for different Livy
servers and they wouldn't overlap.
> Multiple Livy servers submitting to Yarn results in LivyException: Session is
> finished ... No YARN application is found with tag livy-session-197-uveqmqyj
> in 300 seconds. Please check your cluster status, it is may be very busy
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LIVY-541
> URL: https://issues.apache.org/jira/browse/LIVY-541
> Project: Livy
> Issue Type: Bug
> Components: Server
> Affects Versions: 0.5.0
> Environment: Hortonworks HDP 2.6
> Reporter: Hari Sekhon
> Priority: Critical
>
> It appears Livy doesn't differentiate sessions properly in Yarn causing
> errors when running multiple Livy servers behind a load balancer for HA /
> performance scaling on the same Hadoop cluster.
> Each livy server uses monotonically incrementing session IDs with a random
> suffix but it appears that the random suffix isn't passed to Yarn which
> results in the following errors on the Livy server which is further behind in
> session numbers because it appears to find the session with the same number
> has already finished (submitted earlier by a different user on another Livy
> server as seen in Yarn RM UI):
> {code:java}
> org.apache.zeppelin.livy.LivyException: Session 197 is finished, appId: null,
> log: [ at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2887), at
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2904),
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511),
> at java.util.concurrent.FutureTask.run(FutureTask.java:266), at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142),
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617),
> at java.lang.Thread.run(Thread.java:748),
> YARN Diagnostics: , java.lang.Exception: No YARN application is found with
> tag livy-session-197-uveqmqyj in 300 seconds. Please check your cluster
> status, it is may be very busy.,
> org.apache.livy.utils.SparkYarnApp.org$apache$livy$utils$SparkYarnApp$$getAppIdFromTag(SparkYarnApp.scala:182)
>
> org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:239)
>
> org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:236)
> scala.Option.getOrElse(Option.scala:120)
> org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:236)
> org.apache.livy.Utils$$anon$1.run(Utils.scala:94)]
> at
> org.apache.zeppelin.livy.BaseLivyInterpreter.createSession(BaseLivyInterpreter.java:300)
> at
> org.apache.zeppelin.livy.BaseLivyInterpreter.initLivySession(BaseLivyInterpreter.java:184)
> at
> org.apache.zeppelin.livy.LivySharedInterpreter.open(LivySharedInterpreter.java:57)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at
> org.apache.zeppelin.livy.BaseLivyInterpreter.getLivySharedInterpreter(BaseLivyInterpreter.java:165)
> at
> org.apache.zeppelin.livy.BaseLivyInterpreter.open(BaseLivyInterpreter.java:139)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)