[ 
https://issues.apache.org/jira/browse/LIVY-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953290#comment-16953290
 ] 

Jeffrey(Xilang) Yan commented on LIVY-541:
------------------------------------------

If I remember correctly, Livy looking for app on yarn with tag, not application 
ID, which is show in log :No YARN application is found with *tag* 
livy-session-197-uveqmqyj in 300 seconds. 
Could you reproduce with 0.6 release, and have you checked if your application 
on yarn is started?

> Multiple Livy servers submitting to Yarn results in LivyException: Session is 
> finished ... No YARN application is found with tag livy-session-197-uveqmqyj 
> in 300 seconds. Please check your cluster status, it is may be very busy
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LIVY-541
>                 URL: https://issues.apache.org/jira/browse/LIVY-541
>             Project: Livy
>          Issue Type: Bug
>          Components: Server
>    Affects Versions: 0.5.0
>         Environment: Hortonworks HDP 2.6
>            Reporter: Hari Sekhon
>            Priority: Critical
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> It appears Livy doesn't differentiate sessions properly in Yarn causing 
> errors when running multiple Livy servers behind a load balancer for HA / 
> performance scaling on the same Hadoop cluster.
> Each livy server uses monotonically incrementing session IDs with a random 
> suffix but it appears that the random suffix isn't passed to Yarn which 
> results in the following errors on the Livy server which is further behind in 
> session numbers because it appears to find the session with the same number 
> has already finished (submitted earlier by a different user on another Livy 
> server as seen in Yarn RM UI):
> {code:java}
> org.apache.zeppelin.livy.LivyException: Session 197 is finished, appId: null, 
> log: [  at 
> org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2887), at 
> org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2904),
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266), at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142),
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617),
>  at java.lang.Thread.run(Thread.java:748), 
> YARN Diagnostics: , java.lang.Exception: No YARN application is found with 
> tag livy-session-197-uveqmqyj in 300 seconds. Please check your cluster 
> status, it is may be very busy., 
> org.apache.livy.utils.SparkYarnApp.org$apache$livy$utils$SparkYarnApp$$getAppIdFromTag(SparkYarnApp.scala:182)
>  
> org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:239)
>  
> org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:236)
>  scala.Option.getOrElse(Option.scala:120) 
> org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:236)
>  org.apache.livy.Utils$$anon$1.run(Utils.scala:94)]
> at 
> org.apache.zeppelin.livy.BaseLivyInterpreter.createSession(BaseLivyInterpreter.java:300)
> at 
> org.apache.zeppelin.livy.BaseLivyInterpreter.initLivySession(BaseLivyInterpreter.java:184)
> at 
> org.apache.zeppelin.livy.LivySharedInterpreter.open(LivySharedInterpreter.java:57)
> at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at 
> org.apache.zeppelin.livy.BaseLivyInterpreter.getLivySharedInterpreter(BaseLivyInterpreter.java:165)
> at 
> org.apache.zeppelin.livy.BaseLivyInterpreter.open(BaseLivyInterpreter.java:139)
> at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
> at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493)
> at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
> at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to