asafm commented on code in PR #875: URL: https://github.com/apache/pulsar-site/pull/875#discussion_r1547583880
########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup Review Comment: Even if the topic has partitions across brokers? How? ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup Review Comment: Back--> Stand-by. The other consumers which are not active, are also connected to the partition in stand-by. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some Review Comment: I provided a better explanation above. What is the relationship to transactions - you mean when TX are enabled, and you commit - you know for sure it was persisted to disk? Even *after* subscription snapshot? ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + Review Comment: Lets add NOTE: Please keep in mind Pulsar is an at-least-once system. When calling `acknowledge()` or `cumulativeAcknlowedge()`, it goes a long way before the acknowledgement is persisted to disk by broker. The acknowledgements are grouped and sent in a batch to the broker - restart of the client abruptly will lose those, since they are in-memory. On the broker, the subscription state is persisted to disk every configured interval hence if broker is abruptly restarted, the acks were not persisted hence lost. In all those occasions, the messages will be redelivered. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. + +### Shared && Key-shared +#### At-least-once +The term `At least once` ensures that the message is processed at least once, and there is a risk of duplicate messages, Review Comment: This sentence was explained better above in the one I gave IMO. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. + +### Shared && Key-shared +#### At-least-once +The term `At least once` ensures that the message is processed at least once, and there is a risk of duplicate messages, +but the performance is better than the `Exactly once` semantics. For the `At least once` semantics in shared or +key-shared mode, the most important matter is avoiding ack-hole as much as possible. The ack-hole refers to instances +where some single messages are missed for acknowledgment. This can occur in the following cases (All of these are the cases +where the connection is not interrupted, and reconnection will automatically resend the message.): + +1. Consumer receives messages but fails to process due to business system error. +2. Consumer receives messages but misses to process or the business system gets stuck. +3. Consumer acknowledges the message, but the acknowledge request is lost when sending to the broker. The possibility of +4. data loss exists in the case of TCP long connections, although the probability is extremely low. +  + +For case 1, configuring `deadLetterPolicy` is a good solution. When the business system receives a signal of message Review Comment: This explanation is redundant IMO. Let's expand at the section above. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. + +### Shared && Key-shared +#### At-least-once +The term `At least once` ensures that the message is processed at least once, and there is a risk of duplicate messages, +but the performance is better than the `Exactly once` semantics. For the `At least once` semantics in shared or +key-shared mode, the most important matter is avoiding ack-hole as much as possible. The ack-hole refers to instances +where some single messages are missed for acknowledgment. This can occur in the following cases (All of these are the cases +where the connection is not interrupted, and reconnection will automatically resend the message.): + +1. Consumer receives messages but fails to process due to business system error. +2. Consumer receives messages but misses to process or the business system gets stuck. +3. Consumer acknowledges the message, but the acknowledge request is lost when sending to the broker. The possibility of +4. data loss exists in the case of TCP long connections, although the probability is extremely low. +  + +For case 1, configuring `deadLetterPolicy` is a good solution. When the business system receives a signal of message +processing failure, it can immediately check and decide whether to retry after a period of time. After they call +`reconsumeLater` API, the message will be acknowledged and resent to the retry letter topic that is automatically +subscribed by the consumer. When it reaches `maxRedeliverCount`, the message will be sent to the dead letter topic. +Messages sent to the dead letter topic should be considered as non-retryable messages, and we recommend setting an +initialSubscriptionName to avoid being deleted by the retention policy and then, let maintenance personnel regularly +handle non-retryable messages in the dead letter topic. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .deadLetterPolicy(DeadLetterPolicy.builder() + .maxRedeliverCount(maxRedeliverCount) + .deadLetterTopic(deadLetterTopic) + .retryLetterTopic(retryLetterTopic) + .initialSubscriptionName(initialSubscriptionName) + .build()) + .subscriptionType(SubscriptionType.Shared) + .enableRetry(true) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); + Message<Integer> message = consumer.receive(); + try{ + // Process message + consumer.acknowledge(message); + }catch(Exception e){ + // Check whether to redeliver the message again. + if(e instanceof RetryException){ + consumer.reconsumeLater(message, delay, timeunit); + } + consumer.reconsumeLater(message, delay, timeunit); + } +``` + +For case 2, where the business system gets stuck or an error occurs which causes the received message to be missed for +processing, configuring ack timeout is a good solution. The consumer will record every message received on the client side. +If these messages have not been acknowledged after the specified time, the consumer will request the broker to resend these +messages to other brokers. +**Suggestion:** The ack timeout should be set slightly longer based on the message processing speed of the business system. +If the business system is still processing messages, but the processing time is too long or the timeout is set too small, +it may result in duplicate message consumption. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .ackTimeout(tiemout, timeunit) + .subscriptionType(SubscriptionType.Shared) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); +``` + +For case 3, there are no effective preventive measures. This is because all methods of redelivery are triggered by the +client when the connection is not disconnected, and the client does not wait for an ack response by default. + +`Short-term solution`: This should be an extreme case where users can unload the topic to resolve after observing +an abnormal ack hole (existing for more than 3 * ack time out). +`Long-term solution`: Modify the ack-timeout mechanism to wait for the acknowledgment response. + +#### Exactly-once +If the users cannot tolerate message repetition, they can acknowledge messages with a transaction. Transaction can +prevent repeated consumption. If a message has been acknowledged, it will wait for a response and throw +`TransactionConflictException` when the client acknowledges the message with a transaction. + +**Notices:** When using transactions, do not configure DeadLetterPolicy, but instead use negativeAcknowledge to resend messages. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .ackTimeout(tiemout, timeunit) + .subscriptionType(SubscriptionType.Shared) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); + Transaction transaction = pulsarClient.newTransaction().withTransactionTimeout(timeout, timeunit).build().get(); + + Message<Integer> message = consumer.receive(); + try { + // process message + consumer.acknowledgeAsync(message.getMessageId(), transaction); + transaction.commit().get(); + } catch (Exception e) { + if (!(e.getCause() instanceof PulsarClientException.TransactionConflictException)) { + consumer.negativeAcknowledge(message); + } + transaction.abort().get(); + } finally { + message.release(); + } +``` +### Exclusive && Failover +#### At-least-once +For `Exclusive` and `Failover` modes, which follow `At least once` semantics, it's crucial to focus on maintaining +the order of messages while ensuring none are lost. Users are recommended to use cumulative acknowledgment in the +`Exclusive` or `Failover` mode. Pulsar guarantees that the user has received all messages prior to a message that will +be cumulative acknowledged. In this mode, there will be no ack-hole and there is no need to redeliver a specific message. +It is also not recommended to redeliver a specific message as it can cause messages to be out of order. When there is a +problem with the processing of a batch of messages, it is recommended to use `redeliverUnacknowledgedMessages` to +redeliver all unprocessed messages to ensure the orderliness of the messages. +````java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .subscriptionType(SubscriptionType.Exclusive) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .subscriptionName(sub) + .subscribe(); + Message<Integer> message = consumer.receive(); + try { + // process message + message = consumer.receive(); + // process message + // ...... + consumer.acknowledgeCumulative(message); + } catch (Exception e) { + consumer.redeliverUnacknowledgedMessages(); + } finally { + message.release(); + } +```` + +#### Exactly-once (Beta) +In the `Exclusive` and `Failover` mode, the most troublesome issue for users is the problem of duplicate messages caused Review Comment: Didn't understand anything here. ########## static/img/blog-consume-best-practice/Ack-hole.png: ########## Review Comment: Any chance to use mermaid for diagrams? ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring Review Comment: When a redeliver command is issued, is guaranteed to dispatch it to a different consumer? ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the Review Comment: I can not understand why would you use cumulativeAck with retry. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be Review Comment: I would write it a bit differently: After "." I would write: A consumer can configure an `ackTimeout`. Once done any message returned to the user via `consume()` will have an `ackTimeut` period to acknowledge it. If not, the client will instruct the broker to redeliver this message to another consumer. It's great when your code consumed the messages, but never acknowledged since it was stuck. Some other consumer might be able to process it successfully. QUESTION: Can we instruct max redeliver? ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. + +### Shared && Key-shared +#### At-least-once +The term `At least once` ensures that the message is processed at least once, and there is a risk of duplicate messages, +but the performance is better than the `Exactly once` semantics. For the `At least once` semantics in shared or +key-shared mode, the most important matter is avoiding ack-hole as much as possible. The ack-hole refers to instances +where some single messages are missed for acknowledgment. This can occur in the following cases (All of these are the cases +where the connection is not interrupted, and reconnection will automatically resend the message.): + +1. Consumer receives messages but fails to process due to business system error. +2. Consumer receives messages but misses to process or the business system gets stuck. +3. Consumer acknowledges the message, but the acknowledge request is lost when sending to the broker. The possibility of +4. data loss exists in the case of TCP long connections, although the probability is extremely low. +  + +For case 1, configuring `deadLetterPolicy` is a good solution. When the business system receives a signal of message +processing failure, it can immediately check and decide whether to retry after a period of time. After they call +`reconsumeLater` API, the message will be acknowledged and resent to the retry letter topic that is automatically +subscribed by the consumer. When it reaches `maxRedeliverCount`, the message will be sent to the dead letter topic. +Messages sent to the dead letter topic should be considered as non-retryable messages, and we recommend setting an +initialSubscriptionName to avoid being deleted by the retention policy and then, let maintenance personnel regularly +handle non-retryable messages in the dead letter topic. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .deadLetterPolicy(DeadLetterPolicy.builder() + .maxRedeliverCount(maxRedeliverCount) + .deadLetterTopic(deadLetterTopic) + .retryLetterTopic(retryLetterTopic) + .initialSubscriptionName(initialSubscriptionName) + .build()) + .subscriptionType(SubscriptionType.Shared) + .enableRetry(true) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); + Message<Integer> message = consumer.receive(); + try{ + // Process message + consumer.acknowledge(message); + }catch(Exception e){ + // Check whether to redeliver the message again. + if(e instanceof RetryException){ + consumer.reconsumeLater(message, delay, timeunit); + } + consumer.reconsumeLater(message, delay, timeunit); + } +``` + +For case 2, where the business system gets stuck or an error occurs which causes the received message to be missed for +processing, configuring ack timeout is a good solution. The consumer will record every message received on the client side. +If these messages have not been acknowledged after the specified time, the consumer will request the broker to resend these +messages to other brokers. +**Suggestion:** The ack timeout should be set slightly longer based on the message processing speed of the business system. +If the business system is still processing messages, but the processing time is too long or the timeout is set too small, +it may result in duplicate message consumption. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .ackTimeout(tiemout, timeunit) + .subscriptionType(SubscriptionType.Shared) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); +``` + +For case 3, there are no effective preventive measures. This is because all methods of redelivery are triggered by the +client when the connection is not disconnected, and the client does not wait for an ack response by default. + +`Short-term solution`: This should be an extreme case where users can unload the topic to resolve after observing +an abnormal ack hole (existing for more than 3 * ack time out). +`Long-term solution`: Modify the ack-timeout mechanism to wait for the acknowledgment response. + +#### Exactly-once +If the users cannot tolerate message repetition, they can acknowledge messages with a transaction. Transaction can +prevent repeated consumption. If a message has been acknowledged, it will wait for a response and throw +`TransactionConflictException` when the client acknowledges the message with a transaction. + +**Notices:** When using transactions, do not configure DeadLetterPolicy, but instead use negativeAcknowledge to resend messages. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .ackTimeout(tiemout, timeunit) + .subscriptionType(SubscriptionType.Shared) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); + Transaction transaction = pulsarClient.newTransaction().withTransactionTimeout(timeout, timeunit).build().get(); + + Message<Integer> message = consumer.receive(); + try { + // process message + consumer.acknowledgeAsync(message.getMessageId(), transaction); + transaction.commit().get(); + } catch (Exception e) { + if (!(e.getCause() instanceof PulsarClientException.TransactionConflictException)) { + consumer.negativeAcknowledge(message); + } + transaction.abort().get(); + } finally { + message.release(); + } +``` +### Exclusive && Failover Review Comment: We have to stop here IMO. Something here in the structure makes it way too long IMO. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single Review Comment: "For multiple partitions..." --> When consuming you might get messages from more then one partition. You need to make sure you call cumulative-acknowlege per the partition. @BewareMyPower I need help here ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch Review Comment: Too confusing. Key-shared allows to have messages of the same key (e.g. customer-id) sent to the same consumer, where in Shared the message are sent in a round-robin manner to all consumers. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism Review Comment: Redelivery ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, Review Comment: Can you please explain that. I can't understand. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single Review Comment: "subscription - in effect: all messages up to the selected message will be marked as processed (acknowledged)" ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. Review Comment: Immediately practically makes no sense. You have the message in your hand. Also, let's start by saying: This is mostly relevant to Shared subscription type ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches Review Comment: Let add: the message will be acknowledged, and a new message, which is a copy of the message read, will be produced to the retry letter topic, bearing a retry number header. When the retry number exceed `maxRedeliverCount`, the message will be produced instead of the dead-letter topic. When calling `reconsumerLater(msg, delay)` the message written to the retry letter topic will produced with delivery delay - it will be persisted immediately by the broker, but will only be delivered to the consumer after `delay` has passed. It's perfect to space out retries. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. + +### Shared && Key-shared +#### At-least-once +The term `At least once` ensures that the message is processed at least once, and there is a risk of duplicate messages, +but the performance is better than the `Exactly once` semantics. For the `At least once` semantics in shared or Review Comment: You can't ever achieve exactly once for consumption, by only using Pulsar. It relies on your target system providing idempotency. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. + +### Shared && Key-shared +#### At-least-once +The term `At least once` ensures that the message is processed at least once, and there is a risk of duplicate messages, +but the performance is better than the `Exactly once` semantics. For the `At least once` semantics in shared or +key-shared mode, the most important matter is avoiding ack-hole as much as possible. The ack-hole refers to instances Review Comment: I think this all paragraph and numbers below do not add any new essential information. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. Review Comment: Let's clarify: 1. The consumers automatically subscribe to the retry letter topic. 2. It's important that the retry-letter topic name will include both the topic name and the subscription name, since we want to avoid having two subscription reading from the topic, writing their retries into the same retry topic. The retries is a state only in the context of your subscription. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. Review Comment: do not care about message order *and* those in stream processing scenario (which usually do care about order) - this is a contradiction ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + Review Comment: Add note: NOTE: TX are not enabled / supported for this. This means the two operations of acknowledging the original message and publishing the new message to the retry topic does not happen atomically. It can happen that the acknowledge of the original message fails, but the new message is produced, so the consumer will consume the same message twice. ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same Review Comment: You're talking about producer best practices here? If so, I think it's out of scope for Redelivery content, no? ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. + +### Shared && Key-shared +#### At-least-once +The term `At least once` ensures that the message is processed at least once, and there is a risk of duplicate messages, +but the performance is better than the `Exactly once` semantics. For the `At least once` semantics in shared or +key-shared mode, the most important matter is avoiding ack-hole as much as possible. The ack-hole refers to instances +where some single messages are missed for acknowledgment. This can occur in the following cases (All of these are the cases +where the connection is not interrupted, and reconnection will automatically resend the message.): + +1. Consumer receives messages but fails to process due to business system error. +2. Consumer receives messages but misses to process or the business system gets stuck. +3. Consumer acknowledges the message, but the acknowledge request is lost when sending to the broker. The possibility of +4. data loss exists in the case of TCP long connections, although the probability is extremely low. +  + +For case 1, configuring `deadLetterPolicy` is a good solution. When the business system receives a signal of message +processing failure, it can immediately check and decide whether to retry after a period of time. After they call +`reconsumeLater` API, the message will be acknowledged and resent to the retry letter topic that is automatically +subscribed by the consumer. When it reaches `maxRedeliverCount`, the message will be sent to the dead letter topic. +Messages sent to the dead letter topic should be considered as non-retryable messages, and we recommend setting an +initialSubscriptionName to avoid being deleted by the retention policy and then, let maintenance personnel regularly +handle non-retryable messages in the dead letter topic. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .deadLetterPolicy(DeadLetterPolicy.builder() + .maxRedeliverCount(maxRedeliverCount) + .deadLetterTopic(deadLetterTopic) + .retryLetterTopic(retryLetterTopic) + .initialSubscriptionName(initialSubscriptionName) + .build()) + .subscriptionType(SubscriptionType.Shared) + .enableRetry(true) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); + Message<Integer> message = consumer.receive(); + try{ + // Process message + consumer.acknowledge(message); + }catch(Exception e){ + // Check whether to redeliver the message again. + if(e instanceof RetryException){ + consumer.reconsumeLater(message, delay, timeunit); + } + consumer.reconsumeLater(message, delay, timeunit); + } +``` + +For case 2, where the business system gets stuck or an error occurs which causes the received message to be missed for +processing, configuring ack timeout is a good solution. The consumer will record every message received on the client side. +If these messages have not been acknowledged after the specified time, the consumer will request the broker to resend these +messages to other brokers. +**Suggestion:** The ack timeout should be set slightly longer based on the message processing speed of the business system. +If the business system is still processing messages, but the processing time is too long or the timeout is set too small, +it may result in duplicate message consumption. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .ackTimeout(tiemout, timeunit) + .subscriptionType(SubscriptionType.Shared) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); +``` + +For case 3, there are no effective preventive measures. This is because all methods of redelivery are triggered by the +client when the connection is not disconnected, and the client does not wait for an ack response by default. + +`Short-term solution`: This should be an extreme case where users can unload the topic to resolve after observing +an abnormal ack hole (existing for more than 3 * ack time out). +`Long-term solution`: Modify the ack-timeout mechanism to wait for the acknowledgment response. + +#### Exactly-once +If the users cannot tolerate message repetition, they can acknowledge messages with a transaction. Transaction can +prevent repeated consumption. If a message has been acknowledged, it will wait for a response and throw +`TransactionConflictException` when the client acknowledges the message with a transaction. + +**Notices:** When using transactions, do not configure DeadLetterPolicy, but instead use negativeAcknowledge to resend messages. Review Comment: Why? ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. + +### Shared && Key-shared +#### At-least-once +The term `At least once` ensures that the message is processed at least once, and there is a risk of duplicate messages, +but the performance is better than the `Exactly once` semantics. For the `At least once` semantics in shared or +key-shared mode, the most important matter is avoiding ack-hole as much as possible. The ack-hole refers to instances +where some single messages are missed for acknowledgment. This can occur in the following cases (All of these are the cases +where the connection is not interrupted, and reconnection will automatically resend the message.): + +1. Consumer receives messages but fails to process due to business system error. +2. Consumer receives messages but misses to process or the business system gets stuck. +3. Consumer acknowledges the message, but the acknowledge request is lost when sending to the broker. The possibility of +4. data loss exists in the case of TCP long connections, although the probability is extremely low. +  + +For case 1, configuring `deadLetterPolicy` is a good solution. When the business system receives a signal of message +processing failure, it can immediately check and decide whether to retry after a period of time. After they call +`reconsumeLater` API, the message will be acknowledged and resent to the retry letter topic that is automatically +subscribed by the consumer. When it reaches `maxRedeliverCount`, the message will be sent to the dead letter topic. +Messages sent to the dead letter topic should be considered as non-retryable messages, and we recommend setting an +initialSubscriptionName to avoid being deleted by the retention policy and then, let maintenance personnel regularly +handle non-retryable messages in the dead letter topic. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .deadLetterPolicy(DeadLetterPolicy.builder() + .maxRedeliverCount(maxRedeliverCount) + .deadLetterTopic(deadLetterTopic) + .retryLetterTopic(retryLetterTopic) + .initialSubscriptionName(initialSubscriptionName) + .build()) + .subscriptionType(SubscriptionType.Shared) + .enableRetry(true) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); + Message<Integer> message = consumer.receive(); + try{ + // Process message + consumer.acknowledge(message); + }catch(Exception e){ + // Check whether to redeliver the message again. + if(e instanceof RetryException){ + consumer.reconsumeLater(message, delay, timeunit); + } + consumer.reconsumeLater(message, delay, timeunit); + } +``` + +For case 2, where the business system gets stuck or an error occurs which causes the received message to be missed for +processing, configuring ack timeout is a good solution. The consumer will record every message received on the client side. +If these messages have not been acknowledged after the specified time, the consumer will request the broker to resend these +messages to other brokers. +**Suggestion:** The ack timeout should be set slightly longer based on the message processing speed of the business system. +If the business system is still processing messages, but the processing time is too long or the timeout is set too small, +it may result in duplicate message consumption. + +```java + Consumer<Integer> consumer = pulsarClient.newConsumer(Schema.INT32) + .ackTimeout(tiemout, timeunit) + .subscriptionType(SubscriptionType.Shared) + .subscriptionInitialPosition(SubscriptionInitialPosition.Earliest) + .topic(topic) + .subscriptionName(sub) + .subscribe(); +``` + +For case 3, there are no effective preventive measures. This is because all methods of redelivery are triggered by the +client when the connection is not disconnected, and the client does not wait for an ack response by default. + +`Short-term solution`: This should be an extreme case where users can unload the topic to resolve after observing +an abnormal ack hole (existing for more than 3 * ack time out). +`Long-term solution`: Modify the ack-timeout mechanism to wait for the acknowledgment response. + +#### Exactly-once +If the users cannot tolerate message repetition, they can acknowledge messages with a transaction. Transaction can Review Comment: I don't understand why. @codelipenghui ? ########## docs/tutorials-redeliver-messages.md: ########## @@ -0,0 +1,236 @@ +--- +Id: tutorials-redeliver-messages +title: Consume best practice +sidebar_label: "Consume best practice" +description: Learn how to consume messages and redeliver unacknowledged messages in Pulsar. +--- + +# Consume Best Practice + +## Background Knowledge + +### Subscription Types + +Pulsar is a distributed message system where messages can be sent to topics by producers and consumed by consumers. +Consumers can subscribe to the topics in four ways (subscription types): + +* **Exclusive** +* **Failover** +* **Shared** +* **Key-shared** + +The messages are consumed in order for a single partition in Exclusive and Failover modes and out of order for Shared and +Key-shared mode. The main difference between Exclusive and Failover is that in Exclusive mode, the consumer is exclusive +to the entire topic, while in Failover mode, the consumer is exclusive to only one partition. Failover mode allows backup +consumer connections that are not consumed. The main difference between Shared and Key-shared is whether their dispatch +strategy is implemented via a key. For more information about subscription type, refer to the [Pulsar website](https://pulsar.apache.org/docs/3.2.x/concepts-messaging/). + + +### Acknowledgment + +The messages should be acknowledged after they are fully consumed and processed, and then the messages would not be received +for the same subscription again. Pulsar provides two ways to acknowledge messages: + +* **Cumulative acknowledgment** +* **Individual acknowledgment** + +Cumulative acknowledgment receives a message or a message id as a parameter and marks the messages before the message as +consumed for this subscription. For multiple-partition topics, the cumulative acknowledgment will work for the single +partition without impacting other partitions. Individual acknowledgment receives a message or a message id as a parameter +and only marks this message as consumed for this subscription. + + +### Messages Redeliver Mechanism + +There might be instances where the received messages cannot be processed at this time or some errors happened during processing. +The client needs to redeliver the unacknowledged messages or a particular message after a delay or immediately. +Pulsar provides at-least-once semantics when the client does not enable transaction because the client may cache some +messages out of Pulsar when redelivering messages. + +Pulsar Consumer API provides four ways to reconsume the unacknowledged messages later: + +* **ackTimeout** +* **deadLetterPolicy** + * **reconsumeLaterCumulative** + * **reconsumeLater** +* **negativeAcknowledge** +* **redeliverUnacknowledgedMessages** + +The **ackTimeout** is an automatic protection mechanism. If a consumer configured ackTimeout, the messages will be +auto-redelivered when the received messages are not acknowledged after a long time. It works at the client side, ensuring +the unacknowledged messages will be redelivered to another consumer when the connection of a consumer remains active +but the business system gets stuck or misses to ack some messages. If the consumer gets disconnected when the client crashes, +the messages will be auto-redelivered by the broker too. However, this mechanism does not wait for the response of acknowledgment, +so if an acknowledgment fails on the broker side or proxy side, an ack hole may occur. + +The **deadLetterPolicy** is a policy in the message queue used to handle messages that cannot be processed properly. +In many message queue systems (such as RabbitMQ, Google Pub/Sub, Apache Pulsar, etc.), this strategy is implemented. +In Pulsar, the deadLetterPolicy is implemented at the client side, it creates a new retry letter topic and dead letter +topic when building a consumer with `deadLetterPolicy` configuration. When a consumer calls `reconsumeLaterCumulative` +or `reconsumeLater`, the message (method parameter) will be produced to the retry letter topic until the retry time reaches +the `maxRedeliverCount`. The message will be produced to the dead letter topic when the retry time reaches the ` +maxRedeliverCount`. The main difference between them is that `reconsumeLaterCumulative` will cumulative ack the +message (method parameter) after it is produced and `reconsumeLater` will individual ack the message (method parameter) +after it is produced. + +The `negativeAcknowledge` is used to redeliver certain unacknowledged messages while `redeliverUnacknowledgedMessages` +is used to redeliver all the unacknowledged messages received by this consumer. The main difference between them and +deadLetterPolicy is that there is no new topic created, and there is an unlimited number of redeliveries. + +## Best Practice Suggestion + +Different scenarios require different best practices. Users who value the order of partition messages and wish to batch +process data should choose or implement an appropriate routerPolicy to send a batch of ordered messages to the same +partition. They should also select either Exclusive or Failover subscription modes. For users who do not care about +message order and those in stream processing scenarios, they can opt to use Shared and Key-shared subscription modes. + +### Shared && Key-shared +#### At-least-once +The term `At least once` ensures that the message is processed at least once, and there is a risk of duplicate messages, +but the performance is better than the `Exactly once` semantics. For the `At least once` semantics in shared or +key-shared mode, the most important matter is avoiding ack-hole as much as possible. The ack-hole refers to instances +where some single messages are missed for acknowledgment. This can occur in the following cases (All of these are the cases +where the connection is not interrupted, and reconnection will automatically resend the message.): + +1. Consumer receives messages but fails to process due to business system error. +2. Consumer receives messages but misses to process or the business system gets stuck. +3. Consumer acknowledges the message, but the acknowledge request is lost when sending to the broker. The possibility of +4. data loss exists in the case of TCP long connections, although the probability is extremely low. +  + +For case 1, configuring `deadLetterPolicy` is a good solution. When the business system receives a signal of message +processing failure, it can immediately check and decide whether to retry after a period of time. After they call +`reconsumeLater` API, the message will be acknowledged and resent to the retry letter topic that is automatically +subscribed by the consumer. When it reaches `maxRedeliverCount`, the message will be sent to the dead letter topic. +Messages sent to the dead letter topic should be considered as non-retryable messages, and we recommend setting an Review Comment: This should be specified above. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
