This is an automated email from the ASF dual-hosted git repository.
urfree pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/pulsar-site.git
The following commit(s) were added to refs/heads/main by this push:
new 985536614ca Docs sync done from apache/pulsar(#9529850)
985536614ca is described below
commit 985536614cab66c1a4c44604ef3609107bfb5067
Author: Pulsar Site Updater <[email protected]>
AuthorDate: Thu Sep 1 12:00:53 2022 +0000
Docs sync done from apache/pulsar(#9529850)
---
site2/website-next/docs/cookbooks-deduplication.md | 25 +++++++++++++---------
site2/website-next/docs/io-elasticsearch-sink.md | 1 +
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/site2/website-next/docs/cookbooks-deduplication.md
b/site2/website-next/docs/cookbooks-deduplication.md
index 702679641d7..de607c9ee14 100644
--- a/site2/website-next/docs/cookbooks-deduplication.md
+++ b/site2/website-next/docs/cookbooks-deduplication.md
@@ -4,6 +4,7 @@ title: Message deduplication
sidebar_label: "Message deduplication "
---
+
````mdx-code-block
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
@@ -12,11 +13,13 @@ import TabItem from '@theme/TabItem';
When **Message deduplication** is enabled, it ensures that each message
produced on Pulsar topics is persisted to disk *only once*, even if the message
is produced more than once. Message deduplication is handled automatically on
the server side.
-To use message deduplication in Pulsar, you need to configure your Pulsar
brokers and clients.
+Message deduplication could affect the performance of the brokers during
informational snapshots.
+
+To use message deduplication in Pulsar, you need to configure your Pulsar
brokers, namespaces, or topics. It is recommended to modify the configuration
in the clients, for example, setting send timeout to infinity.
## How it works
-You can enable or disable message deduplication at the namespace level or the
topic level. By default, it is disabled on all namespaces or topics. You can
enable it in the following ways:
+You can enable or disable message deduplication at broker, namespace, or topic
level. By default, it is disabled on all brokers, namespaces, or topics. You
can enable it in the following ways:
* Enable deduplication for all namespaces/topics at the broker-level.
* Enable deduplication for a specific namespace with the `pulsar-admin
namespaces` interface.
@@ -40,7 +43,7 @@ By default, message deduplication is *disabled* on all Pulsar
namespaces/topics.
Even if you set the value for `brokerDeduplicationEnabled`, enabling or
disabling via Pulsar admin CLI overrides the default settings at the
broker-level.
-### Enable message deduplication
+### Enable message deduplication at namespace or topic level
Though message deduplication is disabled by default at the broker level, you
can enable message deduplication for a specific namespace or topic using the
[`pulsar-admin namespaces set-deduplication`](/tools/pulsar-admin/) or the
[`pulsar-admin topics set-deduplication`](/tools/pulsar-admin/) command. You
can use the `--enable`/`-e` flag and specify the namespace/topic.
@@ -54,7 +57,7 @@ $ bin/pulsar-admin namespaces set-deduplication \
```
-### Disable message deduplication
+### Disable message deduplication at namespace or topic level
Even if you enable message deduplication at the broker level, you can disable
message deduplication for a specific namespace or topic using the
[`pulsar-admin namespace set-deduplication`](/tools/pulsar-admin/) or the
[`pulsar-admin topics set-deduplication`](/tools/pulsar-admin/) command. Use
the `--disable`/`-d` flag and specify the namespace/topic.
@@ -70,7 +73,9 @@ $ bin/pulsar-admin namespaces set-deduplication \
## Pulsar clients
-If you enable message deduplication in Pulsar brokers, you need complete the
following tasks for your client producers:
+If you enable message deduplication in Pulsar brokers, namespaces, or topics,
it is recommended to make the client retry infinitely the messages until it
succeeds, otherwise it is possible to break the ordering guarantee as some
requests may time out and the application does not know whether the request is
successfully added to the topic or not.
+
+So you need to complete the following tasks for your client producers:
1. Specify a name for the producer.
1. Set the message timeout to `0` (namely, no timeout).
@@ -83,7 +88,7 @@ The instructions for Java, Python, and C++ clients are
different.
values={[{"label":"Java clients","value":"Java clients"},{"label":"Python
clients","value":"Python clients"},{"label":"C++ clients","value":"C++
clients"}]}>
<TabItem value="Java clients">
-To enable message deduplication on a [Java
producer](client-libraries-java#producer), set the producer name using the
`producerName` setter, and set the timeout to `0` using the `sendTimeout`
setter.
+To ensure the guarantee order on a [Java
producer](client-libraries-java.md#producers) sending to a topic with message
deduplication enabled, set the producer name using the `producerName` setter,
and set the timeout to `0` using the `sendTimeout` setter.
```java
@@ -105,7 +110,7 @@ Producer producer = pulsarClient.newProducer()
</TabItem>
<TabItem value="Python clients">
-To enable message deduplication on a [Python
producer](client-libraries-python#producer), set the producer name using
`producer_name`, and set the timeout to `0` using `send_timeout_millis`.
+Not to break the guarantee order on a [Python
producer](client-libraries-python.md#producers) sending to a topic with message
deduplication active, set the producer name using `producer_name`, and set the
timeout to `0` using `send_timeout_millis`.
```python
@@ -121,8 +126,7 @@ producer = client.create_producer(
</TabItem>
<TabItem value="C++ clients">
-
-To enable message deduplication on a [C++
producer](client-libraries-cpp/#create-a-producer), set the producer name using
`producer_name`, and set the timeout to `0` using `send_timeout_millis`.
+Not to break the guarantee order on a [C++
producer](client-libraries-cpp.md#producer) sending to a topic with message
deduplication active, set the producer name using `producer_name`, and set the
timeout to `0` using `send_timeout_millis`.
```cpp
@@ -147,4 +151,5 @@ Result result = client.createProducer(topic,
producerConfig, producer);
</TabItem>
</Tabs>
-````
\ No newline at end of file
+````
+
diff --git a/site2/website-next/docs/io-elasticsearch-sink.md
b/site2/website-next/docs/io-elasticsearch-sink.md
index 88f7fabb9b5..04e195a776b 100644
--- a/site2/website-next/docs/io-elasticsearch-sink.md
+++ b/site2/website-next/docs/io-elasticsearch-sink.md
@@ -89,6 +89,7 @@ The configuration of the Elasticsearch sink connector has the
following properti
| `canonicalKeyFields` | Boolean | false | false | Whether to sort the key
fields for JSON and Avro or not. If it is set to `true` and the record key
schema is `JSON` or `AVRO`, the serialized object does not consider the order
of properties. |
| `stripNonPrintableCharacters` | Boolean| false | true| Whether to remove all
non-printable characters from the document or not. If it is set to true, all
non-printable characters are removed from the document. |
| `idHashingAlgorithm` | enum(NONE,SHA256,SHA512)| false | NONE|Hashing
algorithm to use for the document id. This is useful in order to be compliant
with the ElasticSearch _id hard limit of 512 bytes. |
+| `copyKeyFields` | Boolean | false | false |If the message key schema is AVRO
or JSON, the message key fields are copied into the ElasticSearch document. |
### Definition of ElasticSearchSslConfig structure: