[ 
https://issues.apache.org/jira/browse/LIVY-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17013498#comment-17013498
 ] 

Bikas Saha commented on LIVY-718:
---------------------------------

Clearly, I am not aligned with the above or else I would not start and push on 
this discussion :)

In my experience, code refactoring and cost is paid once initially and easier 
to test relative to operational complexity and runtime correctness/reliability.

However, if others are onboard with the current proposal then I will not pursue 
this discussion further.

 

On the voting thread, IIRC the ask had been to add more details to the design 
doc and align on parts where no conclusion has been reached yet. Is that done? 
I ask because a couple of PRs are committed already indicating that coding has 
started. Even if we go with the proposal in the document, having the details 
water tight and converged is super important for a feature like this which 
involves distributed state and coordination. These things are notoriously 
difficult to get right. So the more we can solidify the design up front the 
safer it will be to implement.

> Support multi-active high availability in Livy
> ----------------------------------------------
>
>                 Key: LIVY-718
>                 URL: https://issues.apache.org/jira/browse/LIVY-718
>             Project: Livy
>          Issue Type: Epic
>          Components: RSC, Server
>            Reporter: Yiheng Wang
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In this JIRA we want to discuss how to implement multi-active high 
> availability in Livy.
> Currently, Livy only supports single node recovery. This is not sufficient in 
> some production environments. In our scenario, the Livy server serves many 
> notebook and JDBC services. We want to make Livy service more fault-tolerant 
> and scalable.
> There're already some proposals in the community for high availability. But 
> they're not so complete or just for active-standby high availability. So we 
> propose a multi-active high availability design to achieve the following 
> goals:
> # One or more servers will serve the client requests at the same time.
> # Sessions are allocated among different servers.
> # When one node crashes, the affected sessions will be moved to other active 
> services.
> Here's our design document, please review and comment:
> https://docs.google.com/document/d/1bD3qYZpw14_NuCcSGUOfqQ0pqvSbCQsOLFuZp26Ohjc/edit?usp=sharing
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to