This is an automated email from the ASF dual-hosted git repository.
liuyu pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/pulsar-site.git
The following commit(s) were added to refs/heads/main by this push:
new 97efc1147fe [improve][doc] SEO for Concepts and Architecture except
Overview and Messaging (#674)
97efc1147fe is described below
commit 97efc1147fe8ad5fa1dd6afbbbb114eeab18f6fc
Author: Zhang Yuxuan <[email protected]>
AuthorDate: Wed Aug 16 11:18:39 2023 +0800
[improve][doc] SEO for Concepts and Architecture except Overview and
Messaging (#674)
---
docs/concepts-architecture-overview.md | 15 +++---
docs/concepts-authentication.md | 1 +
docs/concepts-clients.md | 26 +++++++----
docs/concepts-cluster-level-failover.md | 62 +++++++++++++++++-------
docs/concepts-messaging.md | 2 +-
docs/concepts-multi-tenancy.md | 11 +++--
docs/concepts-multiple-advertised-listeners.md | 1 +
docs/concepts-proxy-sni-routing.md | 5 +-
docs/concepts-replication.md | 19 ++++----
docs/concepts-throttling.md | 17 +++----
docs/concepts-topic-compaction.md | 32 ++++++++++---
docs/reference-terminology.md | 65 +++++++-------------------
docs/tutorials-namespace.md | 4 +-
docs/tutorials-tenant.md | 2 +-
14 files changed, 148 insertions(+), 114 deletions(-)
diff --git a/docs/concepts-architecture-overview.md
b/docs/concepts-architecture-overview.md
index a8593a5e30d..0b5d40b730d 100644
--- a/docs/concepts-architecture-overview.md
+++ b/docs/concepts-architecture-overview.md
@@ -2,11 +2,12 @@
id: concepts-architecture-overview
title: Architecture Overview
sidebar_label: "Architecture"
+description: Get a comprehensive understanding of the architecture of Apache
Pulsar
---
At the highest level, a Pulsar instance is composed of one or more Pulsar
clusters. Clusters within an instance can [replicate](concepts-replication.md)
data amongst themselves.
-In a Pulsar cluster:
+A Pulsar cluster consists of the following components:
* One or more brokers handles and [load
balances](administration-load-balance.md) incoming messages from producers,
dispatches messages to consumers, communicates with the Pulsar configuration
store to handle various coordination tasks, stores messages in BookKeeper
instances (aka bookies), relies on a cluster-specific ZooKeeper cluster for
certain tasks, and more.
* A BookKeeper cluster consisting of one or more bookies handles [persistent
storage](#persistent-storage) of messages.
@@ -56,7 +57,7 @@ In a Pulsar instance:
## Configuration store
-The configuration store maintains all the configurations of a Pulsar instance,
such as clusters, tenants, namespaces, partitioned topic-related
configurations, and so on. A Pulsar instance can have a single local cluster,
multiple local clusters, or multiple cross-region clusters. Consequently, the
configuration store can share the configurations across multiple clusters under
a Pulsar instance. The configuration store can be deployed on a separate
ZooKeeper cluster or deployed on an exi [...]
+The configuration store is a ZooKeeper quorum that is used for
configuration-specific tasks and it maintains all the configurations of a
Pulsar instance, such as clusters, tenants, namespaces, partitioned
topic-related configurations, and so on. A Pulsar instance can have a single
local cluster, multiple local clusters, or multiple cross-region clusters.
Consequently, the configuration store can share the configurations across
multiple clusters under a Pulsar instance. The configuration [...]
## Persistent storage
@@ -75,7 +76,7 @@ Pulsar uses a system called [Apache
BookKeeper](http://bookkeeper.apache.org/) f
* It's horizontally scalable in both capacity and throughput. Capacity can be
immediately increased by adding more bookies to a cluster.
* Bookies are designed to handle thousands of ledgers with concurrent reads
and writes. By using multiple disk devices---one for journal and another for
general storage--bookies can isolate the effects of reading operations from the
latency of ongoing write operations.
-In addition to message data, *cursors* are also persistently stored in
BookKeeper. Cursors are [subscription](reference-terminology.md#subscription)
positions for [consumers](reference-terminology.md#consumer). BookKeeper
enables Pulsar to store consumer position in a scalable fashion.
+In addition to message data, *cursors* are also persistently stored in
BookKeeper. Cursors are [subscription](concepts-messaging.md#subscriptions)
positions for [consumers](concepts-clients.md#consumer). BookKeeper enables
Pulsar to store consumer position in a scalable fashion.
At the moment, Pulsar supports persistent message storage. This accounts for
the `persistent` in all topic names. Here's an example:
@@ -83,12 +84,12 @@ At the moment, Pulsar supports persistent message storage.
This accounts for the
persistent://my-tenant/my-namespace/my-topic
```
-> Pulsar also supports ephemeral
([non-persistent](concepts-messaging.md#non-persistent-topics) message storage.
+> Pulsar also supports ephemeral
[non-persistent](concepts-messaging.md#non-persistent-topics) message storage.
You can see an illustration of how brokers and bookies interact in the diagram
below:
-
+
### Ledgers
@@ -144,13 +145,13 @@ Some important things to know about the Pulsar proxy:
## Service discovery
-[Clients](concepts-clients.md) connecting to Pulsar brokers need to be able to
communicate with an entire Pulsar instance using a single URL.
+Service discovery is a mechanism that enables connecting
[clients](concepts-clients.md) to use just a single URL to interact with an
entire Pulsar instance.
You can use your own service discovery system if you'd like. If you use your
own system, there is just one requirement: when a client performs an HTTP
request to an endpoint, such as `http://pulsar.us-west.example.com:8080`, the
client needs to be redirected to *some* active broker in the desired cluster,
whether via DNS, an HTTP or IP redirect, or some other means.
The diagram below illustrates Pulsar service discovery:
-
+
In this diagram, the Pulsar cluster is addressable via a single DNS name:
`pulsar-cluster.acme.com`. A [Python client](client-libraries-python.md), for
example, could access this Pulsar cluster like this:
diff --git a/docs/concepts-authentication.md b/docs/concepts-authentication.md
index fbefead69f3..e4672574bfd 100644
--- a/docs/concepts-authentication.md
+++ b/docs/concepts-authentication.md
@@ -2,6 +2,7 @@
id: concepts-authentication
title: Authentication and Authorization
sidebar_label: "Authentication and Authorization"
+description: Get a high-level understanding of authentication and
authorization in Pulsar.
---
Pulsar supports a pluggable [authentication](security-overview.md) mechanism
which can be configured at the proxy and/or the broker. Pulsar also supports a
pluggable [authorization](security-authorization.md) mechanism. These
mechanisms work together to identify the client and its access rights on
topics, namespaces and tenants.
diff --git a/docs/concepts-clients.md b/docs/concepts-clients.md
index cb82b4a9d31..735d3b4b188 100644
--- a/docs/concepts-clients.md
+++ b/docs/concepts-clients.md
@@ -2,6 +2,7 @@
id: concepts-clients
title: Pulsar Clients
sidebar_label: "Clients"
+description: Get a comprehensive understanding of client APIs with language
bindings for Java, C++, Go, Python, Node.js and C# in Pulsar.
---
Pulsar exposes a client API with language bindings for
[Java](client-libraries-java.md), [C++](client-libraries-cpp.md),
[Go](client-libraries-go.md), [Python](client-libraries-python.md),
[Node.js](client-libraries-node.md) and [C#](client-libraries-dotnet.md). The
client API optimizes and encapsulates Pulsar's client-broker communication
protocol and exposes a simple and intuitive API for use by applications.
@@ -12,18 +13,23 @@ Pulsar client libraries support transparent reconnection
and/or connection failo
Before an application creates a producer/consumer, the Pulsar client library
needs to initiate a setup phase including two steps:
-1. The client attempts to determine the owner of the topic by sending an HTTP
lookup request to the broker. The request could reach one of the active brokers
which, by looking at the (cached) zookeeper metadata knows who is serving the
topic or, in case nobody is serving it, tries to assign it to the least loaded
broker.
-2. Once the client library has the broker address, it creates a TCP connection
(or reuses an existing connection from the pool) and authenticates it. Within
this connection, the client and broker exchange binary commands from a custom
protocol. At this point, the client sends a command to create producer/consumer
to the broker, which will comply after having validated the authorization
policy.
+1. The client attempts to determine the owner of the topic by sending an HTTP
lookup request to the broker.
+
+ The request could reach one of the active brokers which, by looking at the
(cached) Zookeeper metadata knows who is serving the topic or, in case nobody
is serving it, tries to assign it to the least loaded broker.
+
+2. Once the client library has the broker address, it creates a TCP connection
(or reuses an existing connection from the pool) and authenticates it.
+
+ Within this connection, the client and broker exchange binary commands
from a custom protocol. At this point, the client sends a command to create
producer/consumer to the broker, which will comply after having validated the
authorization policy.
Whenever the TCP connection breaks, the client immediately re-initiates this
setup phase and keeps trying with exponential backoff to re-establish the
producer or consumer until the operation succeeds.
## Producer
-A producer is a process that attaches to a topic and publishes messages to a
Pulsar [broker](reference-terminology.md#broker). The Pulsar broker processes
the messages.
+A producer is a process that attaches to a topic and publishes messages to a
Pulsar [broker](concepts-architecture-overview.md#broker). The Pulsar broker
processes the messages.
### Send mode
-Producers send messages to brokers synchronously (sync) or asynchronously
(async).
+Send mode is a mechanism determining whether producers send messages to
brokers synchronously (sync) or asynchronously (async).
| Mode | Description
|
|:-----------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -32,7 +38,7 @@ Producers send messages to brokers synchronously (sync) or
asynchronously (async
### Access mode
-You can have different types of access modes on topics for producers.
+Access mode is a mechanism determining the permissions of producers on topics.
| Access mode | Description
[...]
|:-----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
@@ -55,13 +61,13 @@ You can set producer access mode through [Java Client
API](/api/client/). For mo
A consumer is a process that attaches to a topic via a subscription and then
receives messages.
-
+
A consumer sends a [flow permit
request](developing-binary-protocol.md#flow-control) to a broker to get
messages. There is a queue at the consumer side to receive messages pushed from
the broker. You can configure the queue size with the
[`receiverQueueSize`](pathname:///reference/#/@pulsar:version_reference@/client/client-configuration-consumer?id=receiverqueuesize)
parameter. The default size is `1000`). Each time `consumer.receive()` is
called, a message is dequeued from the buffer.
### Receive mode
-Messages are received from [brokers](reference-terminology.md#broker) either
synchronously (sync) or asynchronously (async).
+Receive mode is a mechanism determining whether messages are received from
[brokers](concepts-architecture-overview.md#brokers) synchronously (sync) or
asynchronously (async).
| Mode | Description
|
|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -74,7 +80,7 @@ Client libraries provide listener implementation for
consumers. For example, the
## Reader
-In Pulsar, the "standard" [consumer interface](#consumer) involves using
consumers to listen on [topics](reference-terminology.md#topic), process
incoming messages, and finally acknowledge those messages when they are
processed. Whenever a new subscription is created, it is initially positioned
at the end of the topic (by default), and consumers associated with that
subscription begin reading with the first message created afterward. Whenever
a consumer connects to a topic using a pre-e [...]
+In Pulsar, the "standard" [consumer interface](#consumer) involves using
consumers to listen on [topics](concepts-messaging.md#topics), process incoming
messages, and finally acknowledge those messages when they are processed.
Whenever a new subscription is created, it is initially positioned at the end
of the topic (by default), and consumers associated with that subscription
begin reading with the first message created afterward. Whenever a consumer
connects to a topic using a pre-exi [...]
The **reader interface** for Pulsar enables applications to manually manage
cursors. When you use a reader to connect to a topic---rather than a
consumer---you need to specify *which* message the reader begins reading from
when it connects to a topic. When connecting to a topic, the reader interface
enables you to begin with:
@@ -94,7 +100,7 @@ Please also note that a reader can have a "backlog", but the
metric is only used
:::
-
+
## TableView
@@ -110,4 +116,4 @@ Each TableView uses one Reader instance per partition, and
reads the topic start
The following figure illustrates the dynamic construction of a TableView
updated with newer values of each key.
-
+
diff --git a/docs/concepts-cluster-level-failover.md
b/docs/concepts-cluster-level-failover.md
index c9bde9a085f..0412c69f4df 100644
--- a/docs/concepts-cluster-level-failover.md
+++ b/docs/concepts-cluster-level-failover.md
@@ -2,8 +2,14 @@
id: concepts-cluster-level-failover
title: Cluster-level failover
sidebar_label: "Cluster-level failover"
+description: Get a comprehensive understanding of concepts, benefits, and use
cases about the cluster-level failover in Pulsar.
---
+````mdx-code-block
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+````
+
This chapter describes the concept, benefits, use cases, constraints, usage,
working principles, and more information about the cluster-level failover.
### Concept of cluster-level failover
@@ -16,14 +22,14 @@ This chapter describes the concept, benefits, use cases,
constraints, usage, wor
Automatic cluster-level failover supports Pulsar clients switching from a
primary cluster to one or several backup clusters automatically and seamlessly
when it detects a failover event based on the configured detecting policy set
by **users**.
-
+
</TabItem>
<TabItem value="Controlled cluster-level failover">
Controlled cluster-level failover supports Pulsar clients switching from a
primary cluster to one or several backup clusters. The switchover is manually
set by **administrators**.
-
+
</TabItem>
@@ -36,19 +42,29 @@ Once the primary cluster functions again, Pulsar clients
can switch back to the
The cluster-level failover provides fault tolerance, continuous availability,
and high availability together. It brings a number of benefits, including but
not limited to:
-* Reduced cost: services can be switched and recovered automatically with no
data loss.
+* Reduced cost
+
+ Services can be switched and recovered automatically with no data loss.
+
+* Simplified management
+
+ Businesses can operate on an "always-on" basis since no immediate user
intervention is required.
-* Simplified management: businesses can operate on an "always-on" basis since
no immediate user intervention is required.
+* Improved stability and robustness
-* Improved stability and robustness: it ensures continuous performance and
minimizes service downtime.
+ It ensures continuous performance and minimizes service downtime.
### When to use cluster-level failover?
The cluster-level failover protects your environment in a number of ways,
including but not limited to:
-* Disaster recovery: cluster-level failover can automatically and seamlessly
transfer the production workload on a primary cluster to one or several backup
clusters, which ensures minimum data loss and reduced recovery time.
+* Disaster recovery
+
+ Cluster-level failover can automatically and seamlessly transfer the
production workload on a primary cluster to one or several backup clusters,
which ensures minimum data loss and reduced recovery time.
-* Planned migration: if you want to migrate production workloads from an old
cluster to a new cluster, you can improve the migration efficiency with
cluster-level failover. For example, you can test whether the data migration
goes smoothly in case of a failover event, identify possible issues and risks
before the migration.
+* Planned migration
+
+ If you want to migrate production workloads from an old cluster to a new
cluster, you can improve the migration efficiency with cluster-level failover.
For example, you can test whether the data migration goes smoothly in case of a
failover event, identify possible issues and risks before the migration.
### When cluster-level failover is triggered?
@@ -60,13 +76,21 @@ The cluster-level failover protects your environment in a
number of ways, includ
Automatic cluster-level failover is triggered when Pulsar clients cannot
connect to the primary cluster for a prolonged period of time. This can be
caused by any number of reasons including, but not limited to:
-* Network failure: internet connection is lost.
+* Network failure
+
+ Internet connection is lost.
-* Power failure: shutdown time of a primary cluster exceeds time limits.
+* Power failure
-* Service error: errors occur on a primary cluster (for example, the primary
cluster does not function because of time limits).
+ Shutdown time of a primary cluster exceeds time limits.
-* Crashed storage space: the primary cluster does not have enough storage
space, but the corresponding storage space on the backup server functions
normally.
+* Service error
+
+ Errors occur on a primary cluster (for example, the primary cluster does
not function because of time limits).
+
+* Crashed storage space
+
+ The primary cluster does not have enough storage space, but the
corresponding storage space on the backup server functions normally.
</TabItem>
<TabItem value="Controlled cluster-level failover">
@@ -82,15 +106,19 @@ Controlled cluster-level failover is triggered when
administrators set the switc
Obviously, the cluster-level failover does not succeed if the backup cluster
is unreachable by active Pulsar clients. This can happen for many reasons,
including but not limited to:
-* Power failure: the backup cluster is shut down or does not function normally.
+* Power failure
-* Crashed storage space: primary and backup clusters do not have enough
storage space.
+ The backup cluster is shut down or does not function normally.
+
+* Crashed storage space
+
+ Primary and backup clusters do not have enough storage space.
* If the failover is initiated, but no cluster can assume the role of an
available cluster due to errors, and the primary cluster is not able to provide
service normally.
* If you manually initiate a switchover, but services cannot be switched to
the backup cluster server, then the system will attempt to switch services back
to the primary cluster.
-* Fail to authenticate or authorize between 1) primary and backup clusters, or
2) between two backup clusters.
+* Fail to authenticate or authorize between primary and backup clusters, or
between two backup clusters.
### What are the limitations of cluster-level failover?
@@ -132,11 +160,13 @@ In an automatic failover cluster, the primary cluster and
backup cluster are awa
3b) If the primary cluster does not come back, the Pulsar client does not
perform the switchover.
-
+
</TabItem>
<TabItem value="Controlled cluster-level failover">
+The controlled failover cluster performs the following actions with
administrator intervention:
+
1. The Pulsar client runs a probe task at intervals defined in `checkInterval`.
2. The probe task fetches the service URL configuration from the URL provider
service, which is configured by `urlProvider`.
@@ -151,7 +181,7 @@ In an automatic failover cluster, the primary cluster and
backup cluster are awa
3b) If the service URL configuration is not changed, it does not perform
the switchover.
-
+
</TabItem>
diff --git a/docs/concepts-messaging.md b/docs/concepts-messaging.md
index e7d1e5d0a3a..1701bd32ccb 100644
--- a/docs/concepts-messaging.md
+++ b/docs/concepts-messaging.md
@@ -64,7 +64,7 @@ Messages can be acknowledged in one of the following two ways:
- Being acknowledged individually
-With individual acknowledgment, the consumer acknowledges each message and
sends an acknowledgment request to the broker.
+ With individual acknowledgment, the consumer acknowledges each message and
sends an acknowledgment request to the broker.
- Being acknowledged cumulatively
diff --git a/docs/concepts-multi-tenancy.md b/docs/concepts-multi-tenancy.md
index a32750dc1d8..adda813bf8b 100644
--- a/docs/concepts-multi-tenancy.md
+++ b/docs/concepts-multi-tenancy.md
@@ -2,6 +2,7 @@
id: concepts-multi-tenancy
title: Multi Tenancy
sidebar_label: "Multi Tenancy"
+description: Get a comprehensive understanding of the concept of tenants and
namespaces in Pulsar.
---
Pulsar was created from the ground up as a multi-tenant system. To support
multi-tenancy, Pulsar has a concept of tenants. Tenants can be spread across
clusters and can each have their own [authentication and
authorization](security-overview.md) scheme applied to them. They are also the
administrative unit at which storage quotas, [message
TTL](cookbooks-retention-expiry.md#time-to-live-ttl), and isolation policies
can be managed.
@@ -16,14 +17,14 @@ As you can see, the tenant is the most basic unit of
categorization for topics (
## Tenants
-To each tenant in a Pulsar instance you can assign:
+A Pulsar tenant is an administrative unit for allocating capacity and
enforcing an authentication or authorization scheme. To each tenant in a Pulsar
instance, you can assign:
* An [authorization](security-authorization.md) scheme
* The set of [clusters](reference-terminology.md#cluster) to which the
tenant's configuration applies
## Namespaces
-Tenants and namespaces are two key concepts of Pulsar to support multi-tenancy.
+A Pulsar namespaces is a logical grouping of topics. Tenants and namespaces
are two key concepts of Pulsar to support multi-tenancy.
* Pulsar is provisioned for specified tenants with appropriate capacity
allocated to the tenant.
* A namespace is the administrative unit nomenclature within a tenant. The
configuration policies set on a namespace apply to all the topics created in
that namespace. A tenant may create multiple namespaces via self-administration
using the REST API and the
[`pulsar-admin`](pathname:///reference/#/@pulsar:version_reference@/pulsar-admin/)
CLI tool. For instance, a tenant with different applications can create a
separate namespace for each application.
@@ -42,14 +43,14 @@ persistent://tenant/app1/topic-3
### Namespace change events and topic-level policies
-Pulsar is a multi-tenant event streaming system. Administrators can manage the
tenants and namespaces by setting policies at different levels. However, the
policies, such as retention policy and storage quota policy, are only available
at a namespace level. In many use cases, users need to set a policy at the
topic level. The namespace change events approach is proposed for supporting
topic-level policies in an efficient way. In this approach, Pulsar is used as
an event log to store name [...]
+Pulsar is a multi-tenant event streaming system. Administrators can manage the
tenants and namespaces by setting policies at different levels. However, the
policies, such as retention policy and storage quota policy, are only available
at a namespace level. In many use cases, users need to set a policy at the
topic level. The namespace change events approach is proposed for supporting
topic-level policies in an efficient way. In this approach, Pulsar is used as
an event log to store name [...]
- Avoid using ZooKeeper and introduce more loads to ZooKeeper.
- Use Pulsar as an event log for propagating the policy cache. It can scale
efficiently.
- Use Pulsar SQL to query the namespace changes and audit the system.
-Each namespace has a [system topic](concepts-messaging.md#system-topic) named
`__change_events`. This system topic stores change events for a given
namespace. The following figure illustrates how to leverage it to update
topic-level policies.
+Each namespace has a [system topic](concepts-messaging.md#system-topic) named
`__change_events`. This system topic stores change events for a given
namespace. The following figure illustrates how to leverage the system topic to
update topic-level policies.
-
+
1. Pulsar Admin clients communicate with the Admin Restful API to update
topic-level policies.
2. Any broker that receives the Admin HTTP request publishes a topic policy
change event to the corresponding system topic (`__change_events`) of the
namespace.
diff --git a/docs/concepts-multiple-advertised-listeners.md
b/docs/concepts-multiple-advertised-listeners.md
index b6b98af87a1..57cde663d1b 100644
--- a/docs/concepts-multiple-advertised-listeners.md
+++ b/docs/concepts-multiple-advertised-listeners.md
@@ -2,6 +2,7 @@
id: concepts-multiple-advertised-listeners
title: Multiple advertised listeners
sidebar_label: "Multiple advertised listeners"
+description: Get a comprehensive understanding of advertised listeners in
Pulsar.
---
When a Pulsar cluster is deployed in the production environment, it may
require to expose multiple advertised addresses for the broker. For example,
when you deploy a Pulsar cluster in Kubernetes and want other clients, which
are not in the same Kubernetes cluster, to connect to the Pulsar cluster, you
need to assign a broker URL to external clients. But clients in the same
Kubernetes cluster can still connect to the Pulsar cluster through the internal
network of Kubernetes.
diff --git a/docs/concepts-proxy-sni-routing.md
b/docs/concepts-proxy-sni-routing.md
index b4a6e4cbf7b..b191017ec50 100644
--- a/docs/concepts-proxy-sni-routing.md
+++ b/docs/concepts-proxy-sni-routing.md
@@ -2,6 +2,7 @@
id: concepts-proxy-sni-routing
title: Proxy support with SNI routing
sidebar_label: "Proxy support with SNI routing"
+description: Get a comprehensive understanding of ATS-SNI Routing in Pulsar.
You can also implement geo-replication with SNI routing.
---
````mdx-code-block
@@ -24,7 +25,7 @@ Pulsar supports SNI routing for geo-replication, so brokers
can connect to broke
This section explains how to set up and use ATS as a reverse proxy, so Pulsar
clients can connect to brokers through the ATS proxy using the SNI routing
protocol on TLS connection.
### Set up ATS Proxy for layer-4 SNI routing
-To support layer 4 SNI routing, you need to configure the `records.conf` and
`ssl_server_name.conf` files.
+To set up ATS proxy for layer 4 SNI routing, you need to configure the
`records.conf` and `ssl_server_name.conf` files.

@@ -138,7 +139,7 @@ client = Client("pulsar+ssl://ats-proxy:443",
### Pulsar geo-replication with SNI routing
You can use the ATS proxy for geo-replication. Pulsar brokers can connect to
brokers in geo-replication by using SNI routing. To enable SNI routing for
broker connection cross clusters, you need to configure SNI proxy URL to the
cluster metadata. If you have configured SNI proxy URL in the cluster metadata,
you can connect to broker cross clusters through the proxy over SNI routing.
-
+
In this example, a Pulsar cluster is deployed into two separate regions,
`us-west` and `us-east`. Both regions are configured with ATS proxy, and
brokers in each region run behind the ATS proxy. We configure the cluster
metadata for both clusters, so brokers in one cluster can use SNI routing and
connect to brokers in other clusters through the ATS proxy.
diff --git a/docs/concepts-replication.md b/docs/concepts-replication.md
index a16fb500cce..934659198fd 100644
--- a/docs/concepts-replication.md
+++ b/docs/concepts-replication.md
@@ -2,6 +2,7 @@
id: concepts-replication
title: Geo Replication
sidebar_label: "Geo Replication"
+description: Get a comprehensive understanding of geo-replication mechanisms
and patterns in Pulsar.
---
Regardless of industries, when an unforeseen event occurs and brings
day-to-day operations to a halt, an organization needs a well-prepared disaster
recovery plan to quickly restore service to clients. However, a disaster
recovery plan usually requires a multi-datacenter deployment with
geographically dispersed data centers. Such a multi-datacenter deployment
requires a geo-replication mechanism to provide additional redundancy in case a
data center fails.
@@ -10,7 +11,7 @@ Pulsar's geo-replication mechanism is typically used for
disaster recovery, enab
The diagram below illustrates the process of
[geo-replication](administration-geo.md). Whenever three producers (P1, P2 and
P3) respectively publish messages to the T1 topic in three clusters, those
messages are instantly replicated across clusters. Once the messages are
replicated, two consumers (C1 and C2) can consume those messages from their
clusters.
-
+
## Replication mechanisms
@@ -20,7 +21,7 @@ The geo-replication mechanism can be categorized into
synchronous geo-replicatio
An asynchronous geo-replicated cluster is composed of multiple physical
clusters set up in different data centers. Messages produced on a Pulsar topic
are first persisted to the local cluster and then replicated asynchronously to
the remote clusters by brokers.
-
+
In normal cases, when there are no connectivity issues, messages are
replicated immediately, at the same time as they are dispatched to local
consumers. Typically, end-to-end delivery latency is defined by the network
round-trip time (RTT) between the data centers. Applications can create
producers and consumers in any of the clusters, even when the remote clusters
are not reachable (for example, during a network partition).
@@ -30,7 +31,7 @@ Asynchronous geo-replication provides lower latency but may
result in weaker con
In synchronous geo-replication, data is synchronously replicated to multiple
data centers and the client has to wait for an acknowledgment from the other
data centers. As illustrated below, when the client issues a write request to
one cluster, the written data will be replicated to the other two data centers.
The write request is only acknowledged to the client when the majority of data
centers (in this example, at least 2 data centers) have acknowledged that the
write has been persisted.
-
+
Synchronous geo-replication in Pulsar is achieved by BookKeeper. A synchronous
geo-replicated cluster consists of a cluster of bookies and a cluster of
brokers that run in multiple data centers, and a global Zookeeper installation
(a ZooKeeper ensemble is running across multiple data centers). You need to
configure a BookKeeper region-aware placement policy to store data across
multiple data centers and guarantee availability constraints on writes.
@@ -39,24 +40,26 @@ Synchronous geo-replication provides the highest
availability and also guarantee
## Replication patterns
-Pulsar provides a great degree of flexibility for customizing your replication
strategy. You can set up different replication patterns to serve your
replication strategy for an application between multiple data centers.
+Pulsar provides a great degree of flexibility for customizing your replication
strategy. You can set up different replication patterns to serve your
replication strategy for an application between multiple data centers.
+
+Pulsar supports the following replication patterns:
### Full-mesh replication
Using full-mesh replication and applying the [selective message
replication](administration-geo.md#selective-replication), you can customize
your replication strategies and topologies between any number of data centers.
-
+
### Active-active replication
Active-active replication is a variation of full-mesh replication, with only
two data centers. Producers can run at any data center to produce messages, and
consumers can consume all messages from all data centers.
-
+
For how to use active-active replication to migrate data between clusters,
refer to
[here](administration-geo.md#migrate-data-between-clusters-using-geo-replication).
### Aggregation replication
-The aggregation replication pattern is typically used when replicating
messages from the edge to the cloud. For example, assume you have 3 clusters in
3 fronting datacenters and one aggregated cluster in a central data center, and
you want to replicate messages from multiple fronting datacenters to the
central data center for aggregation purposes. You can then create an individual
namespace for the topics used by each fronting data center and assign the
aggregated data center to those na [...]
+The aggregation replication pattern is typically used when replicating
messages from the edge to the cloud. For example, assume you have 3 clusters in
3 fronting data centers and one aggregated cluster in a central data center,
and you want to replicate messages from multiple fronting data centers to the
central data center for aggregation purposes. You can then create an individual
namespace for the topics used by each fronting data center and assign the
aggregated data center to those [...]
-
+
diff --git a/docs/concepts-throttling.md b/docs/concepts-throttling.md
index c849caf1c8b..7ba77a0682b 100644
--- a/docs/concepts-throttling.md
+++ b/docs/concepts-throttling.md
@@ -2,6 +2,7 @@
id: concepts-throttling
title: Message dispatch throttling
sidebar_label: "Message throttling"
+description: Get a comprehensive understanding of message dispatch throttling
in Pulsar.
---
## Overview
@@ -12,9 +13,9 @@ Large message payloads can cause memory usage spikes that
lead to performance de
For example, when you configure the dispatch rate limit to 10 messages per
second, then the number of messages that can be delivered to the client per
second is up to 10.
-
+
-### Why use it?
+### Why use message dispatch throttling?
Message dispatch throttling brings the following benefits in detail:
@@ -30,7 +31,7 @@ Message dispatch throttling brings the following benefits in
detail:
When there is a large backlog of messages to consume, clients may receive a
large amount of data in a short period of time, which monopolizes their
computing resources. Since the client has no mechanisms to proactively limit
the consumption rate, using the message dispatch throttling feature can also
regulate the allocation of the client's hardware resources.
-### How it works?
+### How message dispatch throttling works?
The process of message dispatch throttling can be divided into the following
steps:
1. The broker approximates the number of entries to read from the bookies by
calculating the remaining quota.
@@ -48,7 +49,7 @@ The process of message dispatch throttling can be divided
into the following ste
### Throttling levels
-The following table outlines the three levels that you can throttle message
dispatch.
+You can set throttle message dispatch at different levels.
Level | Description
:-----|:------------
@@ -64,7 +65,7 @@ The dispatch rate limits configured at multiple levels take
effect simultaneousl
### Throttling approaches
-The following table outlines multiple approaches to configure the dispatch
rate limits at different levels.
+You can use multiple throttling approaches to configure dispatch rate limits
at different levels.
Approach | Per cluster | Per topic | Per subscription
:--------|:------------|:----------|:----------------
@@ -100,7 +101,7 @@ dispatchThrottlingOnNonBacklogConsumerEnabled | Whether the
dispatch throttling
:::
-## Limitations
+## Limitations of message dispatch throttling
Message dispatch throttling may cause messages over-delivered per unit of time
due to the following reasons:
@@ -124,7 +125,7 @@ Message dispatch throttling may cause messages
over-delivered per unit of time d
The broker uses the average publish size in preference to the average
dispatch size. If the average publish size is unavailable, then it uses the
average dispatch size. When none of the two metrics are available, the broker
only reads one entry at the first attempt.
- **b) The number of messages delivered to the client may exceed the
configured threshold.**
+ b) **The number of messages delivered to the client may exceed the
configured threshold.**
When you set the dispatch rate limit in message-count/throttling-period
(`dispatchThrottlingRateInMsg`/`ratePeriodInSecond`) and batching
(`batch-send`) is enabled, the broker counts an entry as one message (despite
the message count per entry) and calculates $$the \ number \ of \ entries \ to
\ read \ from \ bookies$$ through the following equation:
@@ -162,6 +163,6 @@ Message dispatch throttling may cause messages
over-delivered per unit of time d
When over-delivery happens, and the delivered message count exceeds the
quota in the current period, then the quota for the next period will be reduced
accordingly. For example, if the rate limit is set to `10/s`, and `11` messages
have been delivered to the client in the first period, then only up to `9`
messages can be delivered to the client in the next period; if 30 messages have
been delivered in the last period, the count of messages to deliver in the next
two periods is `0`.
- 
+ 
:::
\ No newline at end of file
diff --git a/docs/concepts-topic-compaction.md
b/docs/concepts-topic-compaction.md
index 3c599e4b94a..5dfc7396df1 100644
--- a/docs/concepts-topic-compaction.md
+++ b/docs/concepts-topic-compaction.md
@@ -2,13 +2,14 @@
id: concepts-topic-compaction
title: Topic Compaction
sidebar_label: "Topic Compaction"
+descriptions: Get a comprehensive understanding of concepts, features, and
workflow of topic compaction in Apache Pulsar.
---
Pulsar was built with highly scalable [persistent
storage](concepts-architecture-overview.md#persistent-storage) of message data
as a primary objective. Pulsar topics enable you to persistently store as many
unacknowledged messages as you need while preserving message ordering. By
default, Pulsar stores *all* unacknowledged/unprocessed messages produced on a
topic. Accumulating many unacknowledged messages on a topic is necessary for
many Pulsar use cases but it can also be very time int [...]
> For a more practical guide to topic compaction, see the [Topic compaction
> cookbook](cookbooks-compaction.md).
-For some use cases, consumers don't need a complete "image" of the topic log.
They may only need a few values to construct a more "shallow" image of the log,
perhaps even just the most recent value. For these kinds of use cases, Pulsar
offers **topic compaction**. When you run compaction on a topic, Pulsar goes
through a topic's backlog and removes messages that are *obscured* by later
messages, i.e. it goes through the topic on a per-key basis and leaves only the
most recent message ass [...]
+For some use cases, consumers don't need a complete "image" of the topic log.
They may only need a few values to construct a more "shallow" image of the log,
perhaps even just the most recent value. For these kinds of use cases, Pulsar
offers **topic compaction**. When you run compaction on a topic, Pulsar goes
through a topic's backlog and removes messages that are *obscured* by later
messages, i.e. topic compaction goes through the topic on a per-key basis and
leaves only the most rece [...]
Pulsar's topic compaction feature:
@@ -23,14 +24,31 @@ Pulsar's topic compaction feature:
## How topic compaction works
-When topic compaction is triggered [via the CLI](cookbooks-compaction.md),
Pulsar will iterate over the entire topic from beginning to end. For each key
that it encounters the compaction routine will keep a record of the latest
occurrence of that key.
+When topic compaction is triggered [via the CLI](cookbooks-compaction.md), it
works in the following steps:
-After that, the broker will create a new [BookKeeper
ledger](concepts-architecture-overview.md#ledgers) and make a second iteration
through each message on the topic. For each message, if the key matches the
latest occurrence of that key, then the key's data payload, message ID, and
metadata will be written to the newly created ledger. If the key doesn't match
the latest then the message will be skipped and left alone. If any given
message has an empty payload, it will be skipped and con [...]
+1. Pulsar will iterate over the entire topic from beginning to end.
-After the initial compaction operation, the Pulsar
[broker](reference-terminology.md#broker) that owns the topic is notified
whenever any future changes are made to the compaction horizon and compacted
backlog. When such changes occur:
+ For each key that it encounters the compaction routine will keep a record of
the latest occurrence of that key.
-* Clients (consumers and readers) that have read compacted enabled will
attempt to read messages from a topic and either:
- * Read from the topic like normal (if the message ID is greater than or
equal to the compaction horizon) or
- * Read beginning at the compaction horizon (if the message ID is lower than
the compaction horizon)
+2. After that, the broker will create a new [BookKeeper
ledger](concepts-architecture-overview.md#ledgers) and make a second iteration
through each message on the topic. For each message:
+
+ - If the key matches the latest occurrence of that key, then the key's
data payload, message ID, and metadata will be written to the newly created
ledger.
+
+ - If the key doesn't match the latest then the message will be skipped and
left alone.
+
+ - If any given message has an empty payload, it will be skipped and
considered deleted (akin to the concept of
[tombstones](https://en.wikipedia.org/wiki/Tombstone_(data_store)) in key-value
databases).
+
+3. At the end of this second iteration through the topic, the newly created
BookKeeper ledger is closed and two things are written to the topic's metadata:
+
+ - The ID of the BookKeeper ledger
+ - The message ID of the last compacted message (this is known as the
**compaction horizon** of the topic).
+
+ Once this metadata is written compaction is complete.
+
+4. After the initial compaction operation, the Pulsar
[broker](concepts-architecture-overview.md#brokers) that owns the topic is
notified whenever any future changes are made to the compaction horizon and
compacted backlog. When such changes occur:
+
+ * Clients (consumers and readers) that have read compacted enabled will
attempt to read messages from a topic and either:
+ * Read from the topic like normal (if the message ID is greater than or
equal to the compaction horizon) or
+ * Read beginning at the compaction horizon (if the message ID is lower
than the compaction horizon)
diff --git a/docs/reference-terminology.md b/docs/reference-terminology.md
index 7e00c1726e7..0e6dedb53f5 100644
--- a/docs/reference-terminology.md
+++ b/docs/reference-terminology.md
@@ -12,29 +12,35 @@ Here is a glossary of terms related to Apache Pulsar:
Pulsar is a distributed messaging system originally created by Yahoo but now
under the stewardship of the Apache Software Foundation.
+### Message
+
+Messages are the basic unit of Pulsar. They're what [producers](#producer)
publish to [topics](#topic)
+and what [consumers](#consumer) then consume from topics.
+
+### Topic
+
+A named channel used to pass messages published by [producers](#producer) to
[consumers](#consumer) who
+process those [messages](#message).
+
+### Partitioned Topic
+
+A topic that is served by multiple Pulsar [brokers](#broker), which enables
higher throughput.
+
### Namespace Bundle
A virtual group of [topics](#topic) that belong to the same
[namespace](#namespace). A namespace bundle
is defined as a range between two 32-bit hashes, such as 0x00000000 and
0xffffffff.
-### Tenant
-
-An administrative unit for allocating capacity and enforcing an
authentication/authorization scheme.
+### Subscription
+A lease on a [topic](#topic) established by a group of [consumers](#consumer).
Pulsar has four subscription
+modes (exclusive, shared, failover and key_shared).
### Pub-Sub
A messaging pattern in which [producer](#producer) processes publish messages
on [topics](#topic) that
are then consumed (processed) by [consumer](#consumer) processes.
-### Producer
-
-A process that publishes [messages](#message) to a Pulsar [topic](#topic).
-
-### Consumer
-
-A process that establishes a subscription to a Pulsar [topic](#topic) and
processes messages published
-to that topic by [producers](#producer).
### Reader
@@ -78,18 +84,6 @@ A group of namespaces that have anti-affinity to each other.
A lightweight Pulsar broker in which all components run in a single Java
Virtual Machine (JVM) process. Standalone
clusters can be run on a single machine and are useful for development
purposes.
-### Cluster
-
-A Pulsar cluster consists of the following components:
-
-- One or more Pulsar [brokers](reference-terminology.md#broker)
-
-- One or more [BookKeeper](reference-terminology.md#bookkeeper) servers (aka
[bookies](reference-terminology.md#bookie))
-
-- A [ZooKeeper](https://zookeeper.apache.org) cluster that provides
configuration and coordination management
-
-Clusters can reside in different geographical regions and replicate messages
to one another in a process called [geo-replication](#geo-replication).
-
### Instance
A group of Pulsar [clusters](#cluster) that act together as a single unit.
@@ -99,34 +93,12 @@ A group of Pulsar [clusters](#cluster) that act together as
a single unit.
Replication of messages across Pulsar [clusters](#cluster), potentially in
different datacenters
or geographical regions.
-### Configuration Store
-
-Pulsar's configuration store (previously known as configuration store) is a
ZooKeeper quorum that
-is used for configuration-specific tasks. A multi-cluster Pulsar installation
requires just one
-configuration store across all [clusters](#cluster).
-
### Topic Lookup
A service provided by Pulsar [brokers](#broker) that enables connecting
clients to automatically determine
which Pulsar [cluster](#cluster) is responsible for a [topic](#topic) (and
thus where message traffic for
the topic needs to be routed).
-### Service Discovery
-
-A mechanism provided by Pulsar that enables connecting clients to use just a
single URL to interact
-with all the [brokers](#broker) in a [cluster](#cluster).
-
-### Broker
-
-A broker is a stateless component of Pulsar [clusters](#cluster). It consists
of two components:
-
-
-- An HTTP server exposing a REST interface for administration and topic lookup.
-
-- A [dispatcher](#dispatcher) that handles all message transfers.
-
-Pulsar clusters typically consist of multiple brokers.
-
### Dispatcher
An asynchronous TCP server used for all data transfers in and out of a Pulsar
[broker](#broker). The Pulsar
@@ -143,9 +115,6 @@ service that Pulsar uses to store data.
Bookie is the name of an individual BookKeeper server. It is effectively the
storage server of Pulsar.
-### Ledger
-
-An append-only data structure in [BookKeeper](#bookkeeper) that is used to
persistently store messages in Pulsar [topics](#topic).
### Functions
diff --git a/docs/tutorials-namespace.md b/docs/tutorials-namespace.md
index e32e3fcf26e..ad5c48f2f40 100644
--- a/docs/tutorials-namespace.md
+++ b/docs/tutorials-namespace.md
@@ -5,7 +5,9 @@ sidebar_label: "Create a namespace"
---
-[Namespaces](concepts-messaging.md#namespaces) can be managed via:
+
+
+[Namespaces](concepts-multi-tenancy.md#namespaces) can be managed via:
- The namespaces command of the pulsar-admin tool
- The /admin/v2/namespaces endpoint of the admin {@inject: rest:REST:/} API
diff --git a/docs/tutorials-tenant.md b/docs/tutorials-tenant.md
index c1c4f041248..57192dd19d3 100644
--- a/docs/tutorials-tenant.md
+++ b/docs/tutorials-tenant.md
@@ -5,7 +5,7 @@ sidebar_label: "Set up a tenant"
---
-Pulsar is a powerful messaging system you can use to process and route high
volumes of data. Each tenant provides a distinct unit of isolation with its own
set of roles, permissions, configuration settings, and bookmarks.
+Pulsar is a powerful messaging system you can use to process and route high
volumes of data. Each [tenant](concepts-multi-tenancy.md#tenants) provides a
distinct unit of isolation with its own set of roles, permissions,
configuration settings, and bookmarks.
In this tutorial, you will create a new tenant, named "apache" in your Pulsar
cluster, hosted in K8s helm.