liuxun created ZEPPELIN-3626:
--------------------------------
Summary: Cluster management and client module design
Key: ZEPPELIN-3626
URL: https://issues.apache.org/jira/browse/ZEPPELIN-3626
Project: Zeppelin
Issue Type: Sub-task
Components: zeppelin-server
Affects Versions: 0.9.0
Reporter: liuxun
Assignee: liuxun
h4. Cluster management service
The cluster management service uses the Raft algorithm library copycatServer to
form a service cluster with consistent service status in the Zeppelin cluster.
# The cluster management service runs in each Zeppelin-Server;
# The cluster management service establishes a cluster by using the
copycatServer class of the Raft algorithm library, maintains the
ClusterStateMachine, and manages the service state metadata of each
Zeppelin-Server through the PutCommand, GetQuery, and DeleteCommand operation
commands.
# Launch the Thrift service in the cluster management service to enable the
cluster interpreter process to be created by remote calls in each
Zeppelin-Server;
h4. Cluster management client
The cluster management client connects to the cluster management service for
metadata operations of services and processes through the Raft algorithm
library copycatClient.
# The cluster management client runs in each Zeppelin-Server and Zeppelin
Interpreter process;
# The cluster management client manages the Zeppelin-Server and Zeppelin
Interpreter process state (metadata information) in the ClusterStateMachine by
using the copycatClient class of the Raft library to connect to the
copycatServer. When the Zeppelin-Server and Zeppelin Interpreter processes are
started, They are added to the ClusterStateMachine and are removed from the
ClusterStateMachine when the Zeppelin-Server and Zeppelin Interpreter processes
are closed.
# In a distributed environment, network anomalies, network delays, or service
exceptions may occur. After copycatClient submits metadata to the cluster, it
checks whether the submission is successful. After the submission fails, the
metadata is saved in the local message queue. Retrying by copycatClient through
a separate commit thread;
h4. Cluster monitoring module
The cluster monitoring module checks if each Zeppelin-Server and Zeppelin
Interpreter process in the cluster is active
# The cluster monitoring module runs in each Zeppelin-Server and Zeppelin
Interpreter process, periodically sending heartbeat data of the service or
process to the cluster;
# When the cluster monitoring module runs in Zeppelin-Server, it collects the
CPU and MEMORY usage of the server, and sends the resource usage rate to the
cluster's ClusterStateMachine. When the cluster interpreter process needs to be
created, the server is idle from the resource. Created in ;
# Resource usage statistics strategy, in order to avoid the instantaneous high
peak and low peak of the server, the cluster monitoring will collect the
average resource usage in the most recent period for reporting, and improve the
reasonable line and effectiveness of the server resources as much as possible;
# When the cluster monitoring module runs in Zeppelin-Server, it checks the
heartbeat data of each Zeppelin-Server and Zeppelin Interpreter process. If it
times out, it considers that the service or process is abnormally unavailable
and removes it from the cluster.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)