Hari Sekhon created LIVY-541:
--------------------------------

             Summary: Multiple Livy servers submitting to Yarn results in 
LivyException: Session is finished ... No YARN application is found with tag 
livy-session-197-uveqmqyj in 300 seconds. Please check your cluster status, it 
is may be very busy
                 Key: LIVY-541
                 URL: https://issues.apache.org/jira/browse/LIVY-541
             Project: Livy
          Issue Type: Bug
          Components: Server
    Affects Versions: 0.5.0
         Environment: Hortonworks HDP 2.6
            Reporter: Hari Sekhon


It appears Livy doesn't differentiate sessions properly in Yarn causing errors 
when running multiple Livy servers behind a load balancer for HA and 
performance scaling on the same Hadoop cluster.

Each livy server uses monotonically incrementing session IDs with a hash suffix 
but it appears that the hash suffix isn't passed to Yarn which results in the 
following errors on the Livy server which is further behind in session numbers 
because it appears to find the session with the same number has already 
finished (submitted earlier by a different user on another Livy server as seen 
in Yarn RM UI):
{code:java}
org.apache.zeppelin.livy.LivyException: Session 197 is finished, appId: null, 
log: [    at 
org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2887), at 
org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer.run(FileSystem.java:2904),
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511), at 
java.util.concurrent.FutureTask.run(FutureTask.java:266), at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142),
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617),
 at java.lang.Thread.run(Thread.java:748), 
YARN Diagnostics: , java.lang.Exception: No YARN application is found with tag 
livy-session-197-uveqmqyj in 300 seconds. Please check your cluster status, it 
is may be very busy., 
org.apache.livy.utils.SparkYarnApp.org$apache$livy$utils$SparkYarnApp$$getAppIdFromTag(SparkYarnApp.scala:182)
 
org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:239)
 
org.apache.livy.utils.SparkYarnApp$$anonfun$1$$anonfun$4.apply(SparkYarnApp.scala:236)
 scala.Option.getOrElse(Option.scala:120) 
org.apache.livy.utils.SparkYarnApp$$anonfun$1.apply$mcV$sp(SparkYarnApp.scala:236)
 org.apache.livy.Utils$$anon$1.run(Utils.scala:94)]
at 
org.apache.zeppelin.livy.BaseLivyInterpreter.createSession(BaseLivyInterpreter.java:300)
at 
org.apache.zeppelin.livy.BaseLivyInterpreter.initLivySession(BaseLivyInterpreter.java:184)
at 
org.apache.zeppelin.livy.LivySharedInterpreter.open(LivySharedInterpreter.java:57)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at 
org.apache.zeppelin.livy.BaseLivyInterpreter.getLivySharedInterpreter(BaseLivyInterpreter.java:165)
at 
org.apache.zeppelin.livy.BaseLivyInterpreter.open(BaseLivyInterpreter.java:139)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:493)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:139)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to