Yiheng Wang commented on LIVY-718:

I updated the design doc based on the recent discussions in the JIRA. The major 
changes are:
 # Refine the solution architecture section
 # Add a new allocateServer method to allocator interface
 # Add details for node-session mapping allocation method [~mgaido]
 # update getAllSession  [~mgaido]
 # Refine the section of comparing client-side routing and server-side routing
 # Add a new section "Load Balancer", which gives example how to put livy 
servers behind a load balancer when using client-side routing [~bikassaha] 
 # Add a new section "Session Recover", which describe we recover a session 
object in a lazy way(when a request for that session arrives, which can be 
leveraged in multi-designated server solution) [~bikassaha]
 # Remove the session recover when there's server failover
 # Add multi-designate server to non-goal and add a new section 
"Multi-designate Server Solution Extension" to discuss how to continue to 
implement a multi-designate server solution from the current one [~bikassaha]

> Support multi-active high availability in Livy
> ----------------------------------------------
>                 Key: LIVY-718
>                 URL: https://issues.apache.org/jira/browse/LIVY-718
>             Project: Livy
>          Issue Type: Epic
>          Components: RSC, Server
>            Reporter: Yiheng Wang
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
> In this JIRA we want to discuss how to implement multi-active high 
> availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in 
> some production environments. In our scenario, the Livy server serves many 
> notebook and JDBC services. We want to make Livy service more fault-tolerant 
> and scalable.
> There're already some proposals in the community for high availability. But 
> they're not so complete or just for active-standby high availability. So we 
> propose a multi-active high availability design to achieve the following 
> goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active 
> services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing

This message was sent by Atlassian Jira

Reply via email to