This is an automated email from the ASF dual-hosted git repository.
hjf pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new efb973d [docs] Update deploy-bare-metal-multi-cluster.md (#11466)
efb973d is described below
commit efb973dbe09eb3f4639eb4a10dd675627e0a8378
Author: fengtao1998 <[email protected]>
AuthorDate: Wed Aug 11 13:35:52 2021 +0800
[docs] Update deploy-bare-metal-multi-cluster.md (#11466)
* Update deploy-bare-metal-multi-cluster.md
* Update deploy-bare-metal-multi-cluster.md
* Update deploy-bare-metal-multi-cluster.md
* Update deploy-bare-metal-multi-cluster.md
* Update deploy-bare-metal-multi-cluster.md
* Update deploy-bare-metal-multi-cluster.md
* Update deploy-bare-metal-multi-cluster.md
* Update site2/docs/deploy-bare-metal-multi-cluster.md
Co-authored-by: Jennifer Huang
<[email protected]>
* Update deploy-bare-metal-multi-cluster.md
* Update deploy-bare-metal-multi-cluster.md
Co-authored-by: Jennifer Huang
<[email protected]>
---
site2/docs/deploy-bare-metal-multi-cluster.md | 74 ++++++++++++---------------
1 file changed, 32 insertions(+), 42 deletions(-)
diff --git a/site2/docs/deploy-bare-metal-multi-cluster.md
b/site2/docs/deploy-bare-metal-multi-cluster.md
index 62b735b..374fe34 100644
--- a/site2/docs/deploy-bare-metal-multi-cluster.md
+++ b/site2/docs/deploy-bare-metal-multi-cluster.md
@@ -4,35 +4,28 @@ title: Deploying a multi-cluster on bare metal
sidebar_label: Bare metal multi-cluster
---
-> ### Tips
+> **Tips**
>
-> 1. Single-cluster Pulsar installations should be sufficient for all but the
most ambitious use cases. If you are interested in experimenting with
-> Pulsar or using it in a startup or on a single team, you had better opt for
a single cluster. For instructions on deploying a single cluster,
-> see the guide [here](deploy-bare-metal.md).
+> 1. You can use single-cluster Pulsar installation in most use cases, such as
experimenting with Pulsar or using Pulsar in a startup or in a single team. If
you need to run a multi-cluster Pulsar instance, see the
[guide](deploy-bare-metal-multi-cluster.md).
>
-> 2. If you want to use all builtin [Pulsar IO](io-overview.md) connectors in
your Pulsar deployment, you need to download `apache-pulsar-io-connectors`
-> package and install `apache-pulsar-io-connectors` under `connectors`
directory in the pulsar directory on every broker node or on every
function-worker node if you
-> run a separate cluster of function workers for [Pulsar
Functions](functions-overview.md).
+> 2. If you want to use all built-in [Pulsar IO](io-overview.md) connectors,
you need to download `apache-pulsar-io-connectors`package and install
`apache-pulsar-io-connectors` under `connectors` directory in the pulsar
directory on every broker node or on every function-worker node if you have run
a separate cluster of function workers for [Pulsar
Functions](functions-overview.md).
>
-> 3. If you want to use [Tiered Storage](concepts-tiered-storage.md) feature
in your Pulsar deployment, you need to download `apache-pulsar-offloaders`
-> package and install `apache-pulsar-offloaders` under `offloaders` directory
in the pulsar directory on every broker node. For more details of how to
configure
-> this feature, you can refer to the [Tiered storage
cookbook](cookbooks-tiered-storage.md).
+>3. If you want to use [Tiered Storage](concepts-tiered-storage.md) feature in
your Pulsar deployment, you need to download `apache-pulsar-offloaders`package
and install `apache-pulsar-offloaders` under `offloaders` directory in the
Pulsar directory on every broker node. For more details of how to configure
this feature, you can refer to the [Tiered storage
cookbook](cookbooks-tiered-storage.md).
-A Pulsar *instance* consists of multiple Pulsar clusters working in unison.
You can distribute clusters across data centers or geographical regions and
replicate the clusters amongst themselves using
[geo-replication](administration-geo.md). Deploying a multi-cluster Pulsar
instance involves the following basic steps:
+A Pulsar instance consists of multiple Pulsar clusters working in unison. You
can distribute clusters across data centers or geographical regions and
replicate the clusters amongst themselves using
[geo-replication](administration-geo.md).Deploying a multi-cluster Pulsar
instance consists of the following steps:
-* Deploying two separate [ZooKeeper](#deploy-zookeeper) quorums: a
[local](#deploy-local-zookeeper) quorum for each cluster in the instance and a
[configuration store](#configuration-store) quorum for instance-wide tasks
-* Initializing [cluster metadata](#cluster-metadata-initialization) for each
cluster
-* Deploying a [BookKeeper cluster](#deploy-bookkeeper) of bookies in each
Pulsar cluster
-* Deploying [brokers](#deploy-brokers) in each Pulsar cluster
+1. Deploying two separate ZooKeeper quorums: a local quorum for each cluster
in the instance and a configuration store quorum for instance-wide tasks
+2. Initializing cluster metadata for each cluster
+3. Deploying a BookKeeper cluster of bookies in each Pulsar cluster
+4. Deploying brokers in each Pulsar cluster
-If you want to deploy a single Pulsar cluster, see [Clusters and
Brokers](getting-started-standalone.md#start-the-cluster).
> #### Run Pulsar locally or on Kubernetes?
-> This guide shows you how to deploy Pulsar in production in a non-Kubernetes
environment. If you want to run a standalone Pulsar cluster on a single machine
for development purposes, see the [Setting up a local
cluster](getting-started-standalone.md) guide. If you want to run Pulsar on
[Kubernetes](https://kubernetes.io), see the [Pulsar on
Kubernetes](deploy-kubernetes.md) guide, which includes sections on running
Pulsar on Kubernetes on [Google Kubernetes Engine](deploy-kubernetes#pul [...]
+> This guide shows you how to deploy Pulsar in production in a non-Kubernetes
environment. If you want to run a standalone Pulsar cluster on a single machine
for development purposes, see the [Setting up a local
cluster](getting-started-standalone.md) guide. If you want to run Pulsar on
[Kubernetes](https://kubernetes.io), see the [Pulsar on
Kubernetes](deploy-kubernetes.md) guide, which includes sections on running
Pulsar on Kubernetes, on Google Kubernetes Engine and on Amazon Web Services.
## System requirement
-Currently, Pulsar is available for 64-bit **macOS*, **Linux**, and
**Windows**. To use Pulsar, you need to install 64-bit JRE/JDK 8 or later
versions.
+Currently, Pulsar is available for 64-bit **macOS**, **Linux**, and
**Windows**. You need to install 64-bit JRE/JDK 8 or later versions.
> **Note**
>
@@ -61,8 +54,6 @@ $ tar xvfz apache-pulsar-{{pulsar:version}}-bin.tar.gz
$ cd apache-pulsar-{{pulsar:version}}
```
-## What your package contains
-
The Pulsar binary package initially contains the following directories:
Directory | Contains
@@ -86,17 +77,17 @@ Directory | Contains
Each Pulsar instance relies on two separate ZooKeeper quorums.
-* [Local ZooKeeper](#deploy-local-zookeeper) operates at the cluster level and
provides cluster-specific configuration management and coordination. Each
Pulsar cluster needs to have a dedicated ZooKeeper cluster.
-* [Configuration Store](#deploy-the-configuration-store) operates at the
instance level and provides configuration management for the entire system (and
thus across clusters). An independent cluster of machines or the same machines
that local ZooKeeper uses can provide the configuration store quorum.
+* Local ZooKeeper operates at the cluster level and provides cluster-specific
configuration management and coordination. Each Pulsar cluster needs a
dedicated ZooKeeper cluster.
+* Configuration Store operates at the instance level and provides
configuration management for the entire system (and thus across clusters). An
independent cluster of machines or the same machines that local ZooKeeper uses
can provide the configuration store quorum.
-The configuration store quorum can be provided by an independent cluster of
machines or by the same machines used by local ZooKeeper.
+You can use an independent cluster of machines or the same machines used by
local ZooKeeper to provide the configuration store quorum.
### Deploy local ZooKeeper
ZooKeeper manages a variety of essential coordination-related and
configuration-related tasks for Pulsar.
-You need to stand up one local ZooKeeper cluster *per Pulsar cluster* for
deploying a Pulsar instance.
+You need to stand up one local ZooKeeper cluster per Pulsar cluster for
deploying a Pulsar instance.
To begin, add all ZooKeeper servers to the quorum configuration specified in
the [`conf/zookeeper.conf`](reference-configuration.md#zookeeper) file. Add a
`server.N` line for each node in the cluster to the configuration, where `N` is
the number of the ZooKeeper node. The following is an example for a three-node
cluster:
@@ -108,6 +99,8 @@ server.3=zk3.us-west.example.com:2888:3888
On each host, you need to specify the ID of the node in the `myid` file of
each node, which is in `data/zookeeper` folder of each server by default (you
can change the file location via the
[`dataDir`](reference-configuration.md#zookeeper-dataDir) parameter).
+>**Tip**
+>
> See the [Multi-server setup
> guide](https://zookeeper.apache.org/doc/r3.4.10/zookeeperAdmin.html#sc_zkMulitServerSetup)
> in the ZooKeeper documentation for detailed information on `myid` and more.
On a ZooKeeper server at `zk1.us-west.example.com`, for example, you could set
the `myid` value like this:
@@ -127,15 +120,15 @@ $ bin/pulsar-daemon start zookeeper
### Deploy the configuration store
-The ZooKeeper cluster that is configured and started up in the section above
is a *local* ZooKeeper cluster that you can use to manage a single Pulsar
cluster. In addition to a local cluster, however, a full Pulsar instance also
requires a configuration store for handling some instance-level configuration
and coordination tasks.
+The ZooKeeper cluster configured and started up in the section above is a
local ZooKeeper cluster that you can use to manage a single Pulsar cluster. In
addition to a local cluster, however, a full Pulsar instance also requires a
configuration store for handling some instance-level configuration and
coordination tasks.
-If you deploy a [single-cluster](#single-cluster-pulsar-instance) instance,
you do not need a separate cluster for the configuration store. If, however,
you deploy a [multi-cluster](#multi-cluster-pulsar-instance) instance, you
should stand up a separate ZooKeeper cluster for configuration tasks.
+If you deploy a single-cluster instance, you do not need a separate cluster
for the configuration store. If, however, you deploy a multi-cluster instance,
you should stand up a separate ZooKeeper cluster for configuration tasks.
#### Single-cluster Pulsar instance
If your Pulsar instance consists of just one cluster, then you can deploy a
configuration store on the same machines as the local ZooKeeper quorum but run
on different TCP ports.
-To deploy a ZooKeeper configuration store in a single-cluster instance, add
the same ZooKeeper servers that the local quorom uses to the configuration file
in
[`conf/global_zookeeper.conf`](reference-configuration.md#configuration-store)
using the same method for [local ZooKeeper](#local-zookeeper), but make sure to
use a different port (2181 is the default for ZooKeeper). The following is an
example that uses port 2184 for a three-node ZooKeeper cluster:
+To deploy a ZooKeeper configuration store in a single-cluster instance, add
the same ZooKeeper servers that the local quorom.You need to use the
configuration file in
[`conf/global_zookeeper.conf`](reference-configuration.md#configuration-store)
using the same method for [local ZooKeeper](#local-zookeeper), but make sure to
use a different port (2181 is the default for ZooKeeper). The following is an
example that uses port 2184 for a three-node ZooKeeper cluster:
```properties
clientPort=2184
@@ -150,20 +143,17 @@ As before, create the `myid` files for each server on
`data/global-zookeeper/myi
When you deploy a global Pulsar instance, with clusters distributed across
different geographical regions, the configuration store serves as a highly
available and strongly consistent metadata store that can tolerate failures and
partitions spanning whole regions.
-The key here is to make sure the ZK quorum members are spread across at least
3 regions and that other regions run as observers.
+The key here is to make sure the ZK quorum members are spread across at least
3 regions, and other regions run as observers.
-Again, given the very low expected load on the configuration store servers,
you can
-share the same hosts used for the local ZooKeeper quorum.
+Again, given the very low expected load on the configuration store servers,
you can share the same hosts used for the local ZooKeeper quorum.
-For example, assume a Pulsar instance with the following clusters `us-west`,
-`us-east`, `us-central`, `eu-central`, `ap-south`. Also assume, each cluster
has its own local ZK servers named such as the following:
+For example, assume a Pulsar instance with the following clusters `us-west`,
`us-east`, `us-central`, `eu-central`, `ap-south`. Also assume, each cluster
has its own local ZK servers named such as the following:
```
zk[1-3].${CLUSTER}.example.com
```
-In this scenario if you want to pick the quorum participants from few clusters
and
-let all the others be ZK observers. For example, to form a 7 servers quorum,
you can pick 3 servers from `us-west`, 2 from `us-central` and 2 from `us-east`.
+In this scenario if you want to pick the quorum participants from few clusters
and let all the others be ZK observers. For example, to form a 7 servers
quorum, you can pick 3 servers from `us-west`, 2 from `us-central` and 2 from
`us-east`.
This method guarantees that writes to configuration store is possible even if
one of these regions is unreachable.
@@ -204,7 +194,7 @@ $ bin/pulsar-daemon start configuration-store
## Cluster metadata initialization
-Once you set up the cluster-specific ZooKeeper and configuration store quorums
for your instance, you need to write some metadata to ZooKeeper for each
cluster in your instance. **you only needs to write these metadata once**.
+Once you set up the cluster-specific ZooKeeper and configuration store quorums
for your instance, you need to write some metadata to ZooKeeper for each
cluster in your instance. **you only need to write these metadata once**.
You can initialize this metadata using the
[`initialize-cluster-metadata`](reference-cli-tools.md#pulsar-initialize-cluster-metadata)
command of the [`pulsar`](reference-cli-tools.md#pulsar) CLI tool. The
following is an example:
@@ -235,7 +225,7 @@ Make sure to run `initialize-cluster-metadata` for each
cluster in your instance
BookKeeper provides [persistent message
storage](concepts-architecture-overview.md#persistent-storage) for Pulsar.
-Each Pulsar broker needs to have its own cluster of bookies. The BookKeeper
cluster shares a local ZooKeeper quorum with the Pulsar cluster.
+Each Pulsar broker needs its own cluster of bookies. The BookKeeper cluster
shares a local ZooKeeper quorum with the Pulsar cluster.
### Configure bookies
@@ -252,7 +242,7 @@ $ bin/pulsar-daemon start bookie
```
You can verify that the bookie works properly using the `bookiesanity` command
for the [BookKeeper shell](reference-cli-tools.md#bookkeeper-shell):
-```shell
+```bash
$ bin/bookkeeper shell bookiesanity
```
@@ -272,7 +262,7 @@ Bookie hosts are responsible for storing message data on
disk. In order for book
Message entries written to bookies are always synced to disk before returning
an acknowledgement to the Pulsar broker. To ensure low write latency,
BookKeeper is
designed to use multiple devices:
-* A **journal** to ensure durability. For sequential writes, having fast
[fsync](https://linux.die.net/man/2/fsync) operations on bookie hosts is
critical. Typically, small and fast [solid-state
drives](https://en.wikipedia.org/wiki/Solid-state_drive) (SSDs) should suffice,
or [hard disk drives](https://en.wikipedia.org/wiki/Hard_disk_drive) (HDDs)
with a [RAID](https://en.wikipedia.org/wiki/RAID)s controller and a
battery-backed write cache. Both solutions can reach fsync latency of ~0.4 ms.
+* A **journal** to ensure durability. For sequential writes, having fast
[fsync](https://linux.die.net/man/2/fsync) operations on bookie hosts is
critical. Typically, small and fast [solid-state
drives](https://en.wikipedia.org/wiki/Solid-state_drive) (SSDs) should suffice,
or [hard disk drives](https://en.wikipedia.org/wiki/Hard_disk_drive) (HDDs)
with a [RAID](https://en.wikipedia.org/wiki/RAID) controller and a
battery-backed write cache. Both solutions can reach fsync latency of ~0.4 ms.
* A **ledger storage device** is where data is stored until all consumers
acknowledge the message. Writes happen in the background, so write I/O is not a
big concern. Reads happen sequentially most of the time and the backlog is
drained only in case of consumer drain. To store large amounts of data, a
typical configuration involves multiple HDDs with a RAID controller.
@@ -333,17 +323,17 @@ $ bin/pulsar broker
## Service discovery
-[Clients](getting-started-clients.md) connecting to Pulsar brokers need to be
able to communicate with an entire Pulsar instance using a single URL. Pulsar
provides a built-in service discovery mechanism that you can set up using the
instructions [immediately below](#service-discovery-setup).
+[Clients](getting-started-clients.md) connecting to Pulsar brokers need to
communicate with an entire Pulsar instance using a single URL. Pulsar provides
a built-in service discovery mechanism that you can set up using the
instructions immediately below.
-You can also use your own service discovery system if you want. If you use
your own system, you only need to satisfy just one requirement: when a client
performs an HTTP request to an [endpoint](reference-configuration.md) for a
Pulsar cluster, such as `http://pulsar.us-west.example.com:8080`, the client
needs to be redirected to *some* active broker in the desired cluster, whether
via DNS, an HTTP or IP redirect, or some other means.
+You can also use your own service discovery system . If you use your own
system, you only need to satisfy just one requirement: when a client performs
an HTTP request to an [endpoint](reference-configuration.md) for a Pulsar
cluster, such as `http://pulsar.us-west.example.com:8080`, the client needs to
be redirected to some active brokers in the desired cluster, whether via DNS,
an HTTP or IP redirect, or some other means.
-> #### Service discovery already provided by many scheduling systems
+> **Service discovery already provided by many scheduling systems**
> Many large-scale deployment systems, such as
> [Kubernetes](deploy-kubernetes), have service discovery systems built in. If
> you run Pulsar on such a system, you may not need to provide your own
> service discovery mechanism.
### Service discovery setup
-The service discovery mechanism that included with Pulsar maintains a list of
active brokers, which stored in ZooKeeper, and supports lookup using HTTP and
also the [binary protocol](developing-binary-protocol.md) of Pulsar.
+The service discovery mechanism included with Pulsar maintains a list of
active brokers, which is stored in ZooKeeper, and supports lookup using HTTP
and also the [binary protocol](developing-binary-protocol.md) of Pulsar.
To get started setting up the built-in service of discovery of Pulsar, you
need to change a few parameters in the
[`conf/discovery.conf`](reference-configuration.md#service-discovery)
configuration file. Set the
[`zookeeperServers`](reference-configuration.md#service-discovery-zookeeperServers)
parameter to the ZooKeeper quorum connection string of the cluster and the
[`configurationStoreServers`](reference-configuration.md#service-discovery-configurationStoreServers)
setting to the [con [...]
store](reference-terminology.md#configuration-store) quorum connection string.