[
https://issues.apache.org/jira/browse/FLINK-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14613238#comment-14613238
]
ASF GitHub Bot commented on FLINK-2288:
---------------------------------------
GitHub user uce opened a pull request:
https://github.com/apache/flink/pull/886
[wip] [FLINK-2288] [FLINK-2302] Setup ZooKeeper for distributed coordination
- FLINK-2288: Setup ZooKeeper for distributed coordination
* Add FlinkZooKeeperQuorumPeer to wrap ZooKeeper's quorum peers with
utilities to write required config values (default datadir, myid)
* Add default conf/zoo.cfg config for ZooKeeper
* Add startup scripts for ZooKeeper quorum
* Add conf/masters file for HA masters
* @rmetzger This PR includes docs ;-)
- FLINK-2302: Allow multiple instances to run on single host
* Multiple TaskManager and JobManager instances can run on a single
host.
@tillrohrmann, you can base your changes on this branch. After that we can
close this PR. I've added TODOs in TaskManager and JobManager, where you need
to integrate your leader election/retrieval service.
From the docs:
## Example: Start and stop a local HA-cluster with 2 JobManagers
1. **Configure ZooKeeper quorum** in `conf/flink.yaml`:
<pre>ha.zookeeper.quorum: localhost</pre>
2. **Configure masters** in `conf/masters`:
<pre>
localhost
localhost</pre>
3. **Configure ZooKeeper server** in `conf/zoo.cfg` (currently it's only
possible to run a single ZooKeeper server per machine, because there is a
single client port per configuration):
<pre>server.0=localhost:2888:3888</pre>
4. **Start ZooKeeper quorum**:
<pre>
$ bin/start-zookeeper-quorum.sh
Starting zookeeper daemon on host localhost.</pre>
5. **Start an HA-cluster**:
<pre>
$ bin/start-cluster-streaming.sh
Starting HA cluster (streaming mode) with 2 masters and 1 peers in
ZooKeeper quorum.
Starting jobmanager daemon on host localhost.
Starting jobmanager daemon on host localhost.
Starting taskmanager daemon on host localhost.</pre>
6. **Stop ZooKeeper quorum and cluster**:
<pre>
$ bin/stop-cluster.sh
Stopping taskmanager daemon (pid: 7647) on localhost.
Stopping jobmanager daemon (pid: 7495) on host localhost.
Stopping jobmanager daemon (pid: 7349) on host localhost.
$ bin/stop-zookeeper-quorum.sh
Stopping zookeeper daemon (pid: 7101) on host localhost.</pre>
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/uce/flink zk-2288
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/886.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #886
----
commit 2b6910edc6927c32aa7abdc42870ae1a0813a9ae
Author: Ufuk Celebi <[email protected]>
Date: 2015-07-03T09:04:45Z
[FLINK-2288] [FLINK-2302] Setup ZooKeeper for distributed coordination
- FLINK-2288: Setup ZooKeeper for distributed coordination
* Add FlinkZooKeeperQuorumPeer to wrap ZooKeeper's quorum peers with
utilities to write required config values (default datadir, myid)
* Add default conf/zoo.cfg config for ZooKeeper
* Add startup scripts for ZooKeeper quorum
* Add conf/masters file for HA masters
- FLINK-2302: Allow multiple instances to run on single host
* Multiple TaskManager and JobManager instances can run on a single
host.
commit 7b4cd2c0ac49c120e1b4f97f4888e1d23ce90937
Author: Ufuk Celebi <[email protected]>
Date: 2015-07-03T14:45:15Z
[FLINK-2288] [docs] Add docs for HA/ZooKeeper setup
----
> Setup ZooKeeper for distributed coordination
> --------------------------------------------
>
> Key: FLINK-2288
> URL: https://issues.apache.org/jira/browse/FLINK-2288
> Project: Flink
> Issue Type: Sub-task
> Components: JobManager, TaskManager
> Reporter: Ufuk Celebi
> Assignee: Ufuk Celebi
> Fix For: 0.10
>
>
> Having standby JM instances for job manager high availabilty requires
> distributed coordination between JM, TM, and clients. For this, we will use
> ZooKeeper (ZK).
> Pros:
> - Proven solution (other projects use it for this as well)
> - Apache TLP with large community, docs, and library with required "recipies"
> like leader election (see below)
> Related Wiki:
> https://cwiki.apache.org/confluence/display/FLINK/JobManager+High+Availability
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)