This is an automated email from the ASF dual-hosted git repository.
lhotari pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/pulsar.git
The following commit(s) were added to refs/heads/master by this push:
new ce38ee2bccb [improve][pip] PIP-369: Flag based selective unload on
changing ns-isolation-policy (#23116)
ce38ee2bccb is described below
commit ce38ee2bccba1f9e1687b53722e175b45b296e76
Author: Anshul Singh <[email protected]>
AuthorDate: Wed Aug 14 19:34:58 2024 +0530
[improve][pip] PIP-369: Flag based selective unload on changing
ns-isolation-policy (#23116)
---
pip/pip-369.md | 124 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 124 insertions(+)
diff --git a/pip/pip-369.md b/pip/pip-369.md
new file mode 100644
index 00000000000..9aeb598110d
--- /dev/null
+++ b/pip/pip-369.md
@@ -0,0 +1,124 @@
+# PIP-369: Flag based selective unload on changing ns-isolation-policy
+
+# Background knowledge
+
+In Apache Pulsar, namespace isolation policies are used to limit the ownership
of certain subsets of namespaces to specific broker groups.
+These policies are defined using regular expressions to match namespaces and
specify primary and secondary broker groups along with failover policy
configurations.
+This ensures that the ownership of the specified namespaces is restricted to
the designated broker groups.
+
+For more information, refer to the [Pulsar documentation on namespace
isolation](https://pulsar.apache.org/docs/next/administration-isolation/#isolation-levels).
+
+# History/Context
+In Apache Pulsar 2.7.1+, there was a flag introduced
(`enableNamespaceIsolationUpdateOnTime`) that controlled whether to unload
namespaces or not when a namespace isolation policy is applied.
https://github.com/apache/pulsar/pull/8976
+
+Later on, in 2.11, rework was done as part of
[PIP-149](https://github.com/apache/pulsar/issues/14365) to make get/set
isolationData calls async,
+which resulted in namespaces to always get unloaded irrespective of
`enableNamespaceIsolationUpdateOnTime` config, not adhering to this config at
all.
+
+And now in 3.3, `enableNamespaceIsolationUpdateOnTime` broker config was
deprecated as it no longer serves any purpose.
https://github.com/apache/pulsar/pull/22449
+
+# Motivation
+
+In Apache Pulsar 3.x, changing a namespace isolation policy results in
unloading all namespace bundles that match the namespace's regular expression
provided in the isolation policy.
+This can be problematic for cases where the regex matches a large subset of
namespaces, such as `tenant-x/.*`.
+One of such case is mentioned on this issue
[#23092](https://github.com/apache/pulsar/issues/23092) where policy change
resulted in 100+ namespace bundles to get unloaded.
+And broker exhausted all the available connections due to too many unload
calls happening at once resulting in 5xx response.
+Other issues that happens with this approach are huge latency spikes as topics
are unavailable until bundles are loaded back, increasing the pending produce
calls.
+The only benefit this approach serves is ensuring that all the namespaces
matching the policy regex will come to correct broker group.
+But when namespace bundles are already on the correct broker group (according
to the policy), unloading those namespaces doesn't serve any purpose.
+
+This PIP aims to address the need to either prevent unnecessary unloading or
provide a more granular approach to determine what should be unloaded.
+
+Some of the cases covered by this PIP are discussed in
[#23094](https://github.com/apache/pulsar/issues/23094) by @grssam.
+> - unload nothing as part of the set policy call
+> - unload every matching namespace as part of the policy set call
+> - unload only the changed namespaces (newly added + removed)
+
+# Goals
+
+## In Scope
+This PIP proposes a flag-based approach to control what should be unloaded
when an isolation policy is applied.
+The possible values for this flag are:
+- **all_matching**: Unload all the namespaces that matches either old or new
policy change.
+- **changed**: Only unload namespaces that are either added or removed due to
the policy change.
+- **none**: Do not unload anything. Unloading can occur naturally due to load
balancing or can be done manually using the unload admin call.
+
+This flag will be a part of isolation policy data with defaults. Objective is
to keep the default behavior unchanged on applying the new policy.
+
+## Out of Scope
+
+Applying concurrency reducer to limit how many async calls will happen in
parallel is out of the scope for this PIP.
+This should be addressed in a separate PIP, as solving the issue of infinite
asynchronous calls probably requires changes to broker configurations and is a
problem present in multiple areas.
+
+# Detailed Design
+
+## Design & Implementation Details
+
+A new flag will be introduced in `NamespaceIsolationData`.
+
+```java
+enum UnloadScope {
+ all_matching, // unloads everything, OLD ⋃ NEW
+ changed, // unload namespaces delta, (new ⋃ old) - (new ∩ old)
+ none, // skip unloading anything, ϕ
+};
+```
+Filters will be added based on the above when namespaces are selected for
unload in set policy call.
+`UnloadScope.all_matching` will be the default in current version.
+
+> **_NOTE:_**
+> For 3.x unchanged behaviour, the `all_matching` UnloadScope option should
only unload namespaces matching new policy (NEW). This matches the current
behavior and maintains backward compatibility from implementation POV.
+>
+> For 4.x,
+> 1. The behaviour for the `all_matching` flag should change to unload
everything matching either the old or new policy (union of both).
+> 2. The default flag value should be `changed`, so accidentally missing this
flag while applying the policy shouldn't impact workloads already on the
correct broker group.
+
+### Public API
+
+A new flag will be added in the NamespaceIsolationData. This changes the
request body when set policy API is called.
+To keep things backwards compatible, `unload_scope` will be optional. API
params will remain unchanged.
+
+Path: `/{cluster}/namespaceIsolationPolicies/{policyName}`
+```json
+{
+ "policy-name": {
+ "namespaces": [...],
+ "primary": [...],
+ "secondary": [...],
+ "auto_failover_policy": {
+ ...
+ },
+ "unload_scope": "all_matching|changed|none"
+ }
+}
+```
+
+### CLI
+
+```shell
+# set call will have an optional flag. Sample command as shown below:
+#
+pulsar-admin ns-isolation-policy set cluster-name policy-name --unload-scope
none
+# Possible values for unload-scope: [all_matching, changed, none]
+```
+
+# Backward & Forward Compatibility
+
+Added flag is optional, that doesn't require any changes to pre-existing
policy data. If the flag is not present then default value shall be considered.
+
+# Alternatives
+
+Boolean flag passed during set policy call to either unload the delta
namespaces (removed and added) without affecting unchanged namespaces or unload
nothing. PR: https://github.com/apache/pulsar/pull/23094
+
+Limitation: This approach does not consider cases where unloading is needed
for every matching namespace as part of the policy set call.
+Manual unloading would be required for unchanged namespaces not on the correct
broker group.
+
+# Links
+
+<!--
+Updated afterwards
+-->
+* Mailing List discussion thread:
https://lists.apache.org/thread/6f8k1typ48817w65pjh6orhks1smpbqg
+* Mailing List voting thread:
https://lists.apache.org/thread/0pj3llwpcy73mrs5s3l5t8kctn2mzyf7
+
+
+PS: This PIP should get cherry-picked to 3.0.x as it provides a way to resolve
the bug mentioned at [#23092](https://github.com/apache/pulsar/issues/23092)
which exist in all the production system today.
\ No newline at end of file