[
https://issues.apache.org/jira/browse/LIVY-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gyorgy Gal updated LIVY-541:
----------------------------
Fix Version/s: 0.10.0
(was: 0.9.0)
This issue has been moved to the 0.10.0 release as part of a bulk update. If
you feel this is moved out inappropriately, feel free to provide justification
and reset the Fix Version to 0.9.0.
> Multiple Livy servers submitting to Yarn results in LivyException: Session is
> finished ... No YARN application is found with tag livy-session-197-uveqmqyj
> in 300 seconds. Please check your cluster status, it is may be very busy
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: LIVY-541
> URL: https://issues.apache.org/jira/browse/LIVY-541
> Project: Livy
> Issue Type: Bug
> Components: Server
> Affects Versions: 0.5.0
> Environment: Hortonworks HDP 2.6
> Reporter: Hari Sekhon
> Priority: Critical
> Fix For: 0.10.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> It appears Livy doesn't differentiate sessions properly in Yarn causing
> errors when running multiple Livy servers behind a load balancer for HA /
> performance scaling on the same Hadoop cluster.
> Each livy server uses monotonically incrementing session IDs with a random
> suffix but it appears that the random suffix isn't passed to Yarn which
> results in the following errors on the Livy server which is further behind in
> session numbers because it appears to find the session with the same number
> has already finished (submitted earlier by a different user on another Livy
> server as seen in Yarn RM UI):
> {code:java}
> org.apache.zeppelin.livy.LivyException: Session 197 is finished, appId: null,
> log: [ at
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2887), at
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2904),
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511),
> at java.util.concurrent.FutureTask.run(FutureTask.java:266), at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142),
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617),
> at java.lang.Thread.run(Thread.java:748),
> YARN Diagnostics: , java.lang.Exception: No YARN application is found with
> tag livy-session-197-uveqmqyj in 300 seconds. Please check your cluster
> status, it is may be very busy.,
> org.apache.livy.utils.SparkYarnApp.org$apache$livy$utils$SparkYarnApp$$getAppIdFromTag(SparkYarnApp.scala:182)
>
> org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:239)
>
> org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:236)
> scala.Option.getOrElse(Option.scala:120)
> org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:236)
> org.apache.livy.Utils$$anon$1.run(Utils.scala:94)]
> at
> org.apache.zeppelin.livy.BaseLivyInterpreter.createSession(BaseLivyInterpreter.java:300)
> at
> org.apache.zeppelin.livy.BaseLivyInterpreter.initLivySession(BaseLivyInterpreter.java:184)
> at
> org.apache.zeppelin.livy.LivySharedInterpreter.open(LivySharedInterpreter.java:57)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at
> org.apache.zeppelin.livy.BaseLivyInterpreter.getLivySharedInterpreter(BaseLivyInterpreter.java:165)
> at
> org.apache.zeppelin.livy.BaseLivyInterpreter.open(BaseLivyInterpreter.java:139)
> at
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)