This is an automated email from the ASF dual-hosted git repository.
daojun pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new 74689280c43 [improve][pip] PIP-327: Support force topic loading for
unrecoverable errors (#21752)
74689280c43 is described below
commit 74689280c4385f7318cede46c95b034cd02686f9
Author: Rajan Dhabalia <[email protected]>
AuthorDate: Mon May 13 23:39:14 2024 -0700
[improve][pip] PIP-327: Support force topic loading for unrecoverable
errors (#21752)
Co-authored-by: 道君 <[email protected]>
---
pip/pip-327.md | 42 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/pip/pip-327.md b/pip/pip-327.md
new file mode 100644
index 00000000000..00b3de424f1
--- /dev/null
+++ b/pip/pip-327.md
@@ -0,0 +1,42 @@
+# PIP-327: Support force topic loading for unrecoverable errors
+
+# Motivation
+
+As discussed in Issue: https://github.com/apache/pulsar/issues/21751
+
+We have introduced a configuration called `autoSkipNonRecoverableData` before
open-sourcing Pulsar as we have come across with various situations when it was
not possible to recover ledgers belonging to managed-ledger or managed-cursors
and the broker was not able to load the topics. In such
situations,`autoSkipNonRecoverableData` flag helps to skip non-recoverable
leger-recovery errors such as ledger_not_found and allows the broker to load
topics by skipping such ledgers in disaster recovery.
+
+Brokers can recognize such non-recoverable errors using bookkeeper error codes
but in some cases, it’s very tricky and not possible to conclude
non-recoverable errors. For example, the broker can not differentiate between
all the ensemble bookies of the ledgers that are temporarily unavailable or are
permanently removed from the cluster without graceful recovery, and because of
that broker doesn’t consider all the bookies deleted as a non-recoverable error
though we can not recover ledge [...]
+
+Therefore, the system admin should have a dynamic configuration called
`managedLedgerForceRecovery` to use in such situations to allow brokers to
forcefully load topics by skipping ledger failures to avoid topic
unavailability and perform auto repairs of the topics. This will allow the
admin to handle disaster recovery situations in a controlled and automated
manner and maintain the topic availability by mitigating such failures.
+
+
+
+# Goals
+
+Support force topic loading and recovery for unrecoverable situation where
broker can skip unrecoverable with uncertain bookkeeper error codes.
+
+
+## Design & Implementation Details
+
+### (1) Broker Changes
+
+Broker will have new configuration `managedLedgerForceRecovery` and if this
flag is enabled then managed ledger will ignore any kind of failure if broker
see's while recovering managed-ledger or managed-cursor.
+
+# Security Considerations
+<!--
+A detailed description of the security details that ought to be considered for
the PIP. This is most relevant for any new HTTP endpoints, new Pulsar Protocol
Commands, and new security features. The goal is to describe details like which
role will have permission to perform an action.
+
+An important aspect to consider is also multi-tenancy: Does the feature I'm
adding have the permissions / roles set in such a way that prevent one tenant
accessing another tenant's data/configuration? For example, the Admin API to
read a specific message for a topic only allows a client to read messages for
the target topic. However, that was not always the case. CVE-2021-41571
(https://github.com/apache/pulsar/wiki/CVE-2021-41571) resulted because the API
was incorrectly written and did [...]
+
+If there is uncertainty for this section, please submit the PIP and request
for feedback on the mailing list.
+-->
+
+
+# General Notes
+
+# Links
+
+Issue: https://github.com/apache/pulsar/issues/21751
+Discuss thread:
https://lists.apache.org/thread/w7w91xztdyy07otw0dh71nl2rn3yy45p
+Vote thread: https://lists.apache.org/thread/hh9t6nz0pqjo7tbfn12nbwtylrvq4f43