Jeffrey(Xilang) Yan created LIVY-698:
----------------------------------------

             Summary: Cluster support for Livy
                 Key: LIVY-698
                 URL: https://issues.apache.org/jira/browse/LIVY-698
             Project: Livy
          Issue Type: New Feature
          Components: Server
    Affects Versions: 0.6.0
            Reporter: Jeffrey(Xilang) Yan


This is a proposal for livy cluster support, it is designed to be a light 
solution for livy cluster which can compatible with non-cluster and different 
HA level. 

Server ID:  integer configured by livy.server.id, with default value 0 which is 
standalone mode. Lagest server id is 213, reason described below.

Session ID: session ID is generated as 3 digit server ID and 7 digit 
auto-increment integer. As the biggest integer is  2,147,483,647, so largest 
server ID is 213 and each server can have 9,999,999 sessions. Limitation here 
is each cluster can have most 213 instance which I think is enough. For 
standalone mode, as server id is 0, so works the same.

Zookeeper: as zookeerp is required by config livy.server.zookeeper.quorum

Server Registration: each server should register themself to zookeeper path 
/livy/server/ 

Leader: one of livy server is elected as leader of cluster on zookeeper path 
/livy/leader/

Coordination between servers:  servers don't talk with earch other directly, 
server just detect which livy server the session lives on, and send http 307 
redirect to the correct livy server.  For example, if server A receive a 
request  http://serverA:8999/sessions/1010000001, server A know session is on 
server B, then it send a 307 redirect to 
http://serverB:8999/sessions/1010000001. 

 

Session HA: consider server failure case, user should be able to decide if want 
to keep sessions on failure server. This lead to two different mode:

Non session HA mode:

With this mode, session lost when sever failed(but it can still work with 
session-store, recover session when server get well)

request redirect: Sever detect session's correct server just by get the first 3 
digit of session id.
Session HA mode:

With this mode,  session information will be persistent to ZK store(reuse 
current session-store code), and recover in another server when server failed.

session registration:  all sessions are registered to zk path 
/livy/session/type/

request redirect: each server detect correct server of session by read zk and 
then send 307 redirect, server may cache session-server pair for a while

server failure: cluster leader should detect server failure, reassign session 
on failure server to other servers, other server should recover session by read 
session information in ZK

server notification: server need to send msg to other server, for example 
leader send command to ask server to recover session. all such msgs are sent 
through zk, in path /livy/notification/

 

Thrift Server: Thrift session has thrift session ID, thrift session can be 
restored with same session ID, however current hive-jdbc doesn't support 
restore session. So for thrift server, just register server to ZK to implement 
same server discovery as hive have. Later when hive-jdbc support restore 
session, thrift server can use exisitng livy cluster solution to adapt 
hive-jdbc.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to