[ 
https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17014723#comment-17014723
 ] 

shanyu zhao commented on LIVY-718:
----------------------------------

[~jerryshao] the active-standby HA for Livy server is to solve the problem of 
hardware/networking failures and upgrade scenario on the active Livy server. 
When the active Livy server is offline, the standby Livy server becomes active 
and read the states from Zookeeper and start to serve requests. This aims at 
High Availability rather then scalability.

The active-active proposal in this PR seems to be more geared towards 
scalability. The designated server proposal by [~yihengw] is simpler and more 
realistic to implement. As far as I know, the HiveServer2 HA is also using the 
designated server approach. The stateless proposal by [~bikassaha] is more 
desirable but much harder to implement. There are many in-memory states like 
access times need to be moved to persistent store, and may need locks for some 
variables.

I think it is beneficial to first have active-standby HA (LIVY-11) checked in, 
while this PR is being worked on, especially it satisfy users with the need for 
HA rather than scalability. 

> Support multi-active high availability in Livy
> ----------------------------------------------
>
>                 Key: LIVY-718
>                 URL: https://issues.apache.org/jira/browse/LIVY-718
>             Project: Livy
>          Issue Type: Epic
>          Components: RSC, Server
>            Reporter: Yiheng Wang
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In this JIRA we want to discuss how to implement multi-active high 
> availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in 
> some production environments. In our scenario, the Livy server serves many 
> notebook and JDBC services. We want to make Livy service more fault-tolerant 
> and scalable.
> There're already some proposals in the community for high availability. But 
> they're not so complete or just for active-standby high availability. So we 
> propose a multi-active high availability design to achieve the following 
> goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active 
> services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to