This is an automated email from the ASF dual-hosted git repository.

daojun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git


The following commit(s) were added to refs/heads/master by this push:
     new 74689280c43 [improve][pip] PIP-327: Support force topic loading for 
unrecoverable errors (#21752)
74689280c43 is described below

commit 74689280c4385f7318cede46c95b034cd02686f9
Author: Rajan Dhabalia <[email protected]>
AuthorDate: Mon May 13 23:39:14 2024 -0700

    [improve][pip] PIP-327: Support force topic loading for unrecoverable 
errors (#21752)
    
    Co-authored-by: 道君 <[email protected]>
---
 pip/pip-327.md | 42 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 42 insertions(+)

diff --git a/pip/pip-327.md b/pip/pip-327.md
new file mode 100644
index 00000000000..00b3de424f1
--- /dev/null
+++ b/pip/pip-327.md
@@ -0,0 +1,42 @@
+# PIP-327: Support force topic loading for unrecoverable errors
+
+# Motivation
+
+As discussed in Issue: https://github.com/apache/pulsar/issues/21751
+
+We have introduced a configuration called `autoSkipNonRecoverableData` before 
open-sourcing Pulsar as we have come across with various situations when it was 
not possible to recover ledgers belonging to managed-ledger or managed-cursors 
and the broker was not able to load the topics. In such 
situations,`autoSkipNonRecoverableData` flag helps to skip non-recoverable 
leger-recovery errors such as ledger_not_found and allows the broker to load 
topics by skipping such ledgers in disaster recovery.
+
+Brokers can recognize such non-recoverable errors using bookkeeper error codes 
but in some cases, it’s very tricky and not possible to conclude 
non-recoverable errors. For example, the broker can not differentiate between 
all the ensemble bookies of the ledgers that are temporarily unavailable or are 
permanently removed from the cluster without graceful recovery, and because of 
that broker doesn’t consider all the bookies deleted as a non-recoverable error 
though we can not recover ledge [...]
+
+Therefore, the system admin should have a dynamic configuration called 
`managedLedgerForceRecovery` to use in such situations to allow brokers to 
forcefully load topics by skipping ledger failures to avoid topic 
unavailability and perform auto repairs of the topics. This will allow the 
admin to handle disaster recovery situations in a controlled and automated 
manner and maintain the topic availability by mitigating such failures. 
+
+
+
+# Goals
+
+Support force topic loading and recovery for unrecoverable situation where 
broker can skip unrecoverable with uncertain bookkeeper error codes.
+
+
+## Design & Implementation Details
+
+### (1) Broker Changes
+
+Broker will have new configuration `managedLedgerForceRecovery` and if this 
flag is enabled then managed ledger will ignore any kind of failure if broker 
see's while recovering managed-ledger or managed-cursor.
+
+# Security Considerations
+<!--
+A detailed description of the security details that ought to be considered for 
the PIP. This is most relevant for any new HTTP endpoints, new Pulsar Protocol 
Commands, and new security features. The goal is to describe details like which 
role will have permission to perform an action.
+
+An important aspect to consider is also multi-tenancy: Does the feature I'm 
adding have the permissions / roles set in such a way that prevent one tenant 
accessing another tenant's data/configuration? For example, the Admin API to 
read a specific message for a topic only allows a client to read messages for 
the target topic. However, that was not always the case. CVE-2021-41571 
(https://github.com/apache/pulsar/wiki/CVE-2021-41571) resulted because the API 
was incorrectly written and did [...]
+
+If there is uncertainty for this section, please submit the PIP and request 
for feedback on the mailing list.
+-->
+
+
+# General Notes
+
+# Links
+
+Issue: https://github.com/apache/pulsar/issues/21751
+Discuss thread: 
https://lists.apache.org/thread/w7w91xztdyy07otw0dh71nl2rn3yy45p
+Vote thread: https://lists.apache.org/thread/hh9t6nz0pqjo7tbfn12nbwtylrvq4f43

Reply via email to