[ 
https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17005434#comment-17005434
 ] 

Bikas Saha commented on LIVY-718:
---------------------------------

There could be in-memory state in the livy server but could that be re-created 
from the state in the Spark driver with an initial sync operation?

If not, then what additional metadata could be stored in the Spark drive to 
make it happen?

The ideal situation would be (keeping in mind Meisam's observations)
 # Any livy client can hit any livy server and continue from where it was. The 
first time a livy server is hit for a session it may take some time to hydrate 
the state in case it was not done in the background.
 ## Note that this can happen even without any livy server failure in cases 
where a load balancer is running in front of the livy server and sticky 
sessions are not working or there is too much hot-spotting.
 # A livy server can (with some extra sync operation if needed) service any 
session from that sessions Spark driver. The only information it needs is the 
information of how to connect with the Spark driver. That could be stored in a 
reliable state store (e.g. even in a YARN application tag for YARN clusters)

If we can achieve the above then the system could be much simpler to operate 
and work with.

IIRC JDBC had a REST and an RPC mode. The RPC mode might not be HA without a 
fat client but perhaps the REST mode could. Does Hive JDBC support HA on the 
Hive Thrift server? Then maybe the hive JDBC client now supports server side 
transitions. If not, then we may have the caveat that HA won't work for such 
connections. I am not super familiar with the JDBC client.

 

> Support multi-active high availability in Livy
> ----------------------------------------------
>
>                 Key: LIVY-718
>                 URL: https://issues.apache.org/jira/browse/LIVY-718
>             Project: Livy
>          Issue Type: Epic
>          Components: RSC, Server
>            Reporter: Yiheng Wang
>            Priority: Major
>
> In this JIRA we want to discuss how to implement multi-active high 
> availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in 
> some production environments. In our scenario, the Livy server serves many 
> notebook and JDBC services. We want to make Livy service more fault-tolerant 
> and scalable.
> There're already some proposals in the community for high availability. But 
> they're not so complete or just for active-standby high availability. So we 
> propose a multi-active high availability design to achieve the following 
> goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active 
> services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to