Jeffrey(Xilang) Yan created LIVY-698:
----------------------------------------
Summary: Cluster support for Livy
Key: LIVY-698
URL: https://issues.apache.org/jira/browse/LIVY-698
Project: Livy
Issue Type: New Feature
Components: Server
Affects Versions: 0.6.0
Reporter: Jeffrey(Xilang) Yan
This is a proposal for livy cluster support, it is designed to be a light
solution for livy cluster which can compatible with non-cluster and different
HA level.
Server ID: integer configured by livy.server.id, with default value 0 which is
standalone mode. Lagest server id is 213, reason described below.
Session ID: session ID is generated as 3 digit server ID and 7 digit
auto-increment integer. As the biggest integer is 2,147,483,647, so largest
server ID is 213 and each server can have 9,999,999 sessions. Limitation here
is each cluster can have most 213 instance which I think is enough. For
standalone mode, as server id is 0, so works the same.
Zookeeper: as zookeerp is required by config livy.server.zookeeper.quorum
Server Registration: each server should register themself to zookeeper path
/livy/server/
Leader: one of livy server is elected as leader of cluster on zookeeper path
/livy/leader/
Coordination between servers: servers don't talk with earch other directly,
server just detect which livy server the session lives on, and send http 307
redirect to the correct livy server. For example, if server A receive a
request http://serverA:8999/sessions/1010000001, server A know session is on
server B, then it send a 307 redirect to
http://serverB:8999/sessions/1010000001.
Session HA: consider server failure case, user should be able to decide if want
to keep sessions on failure server. This lead to two different mode:
Non session HA mode:
With this mode, session lost when sever failed(but it can still work with
session-store, recover session when server get well)
request redirect: Sever detect session's correct server just by get the first 3
digit of session id.
Session HA mode:
With this mode, session information will be persistent to ZK store(reuse
current session-store code), and recover in another server when server failed.
session registration: all sessions are registered to zk path
/livy/session/type/
request redirect: each server detect correct server of session by read zk and
then send 307 redirect, server may cache session-server pair for a while
server failure: cluster leader should detect server failure, reassign session
on failure server to other servers, other server should recover session by read
session information in ZK
server notification: server need to send msg to other server, for example
leader send command to ask server to recover session. all such msgs are sent
through zk, in path /livy/notification/
Thrift Server: Thrift session has thrift session ID, thrift session can be
restored with same session ID, however current hive-jdbc doesn't support
restore session. So for thrift server, just register server to ZK to implement
same server discovery as hive have. Later when hive-jdbc support restore
session, thrift server can use exisitng livy cluster solution to adapt
hive-jdbc.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)