[GitHub] [pulsar] michaeljmarshall commented on a diff in pull request #20748: [pip][design] PIP-281: Optimize Bundle Unload(Transfer) Protocol for ExtensibleLoadManager

via GitHub Tue, 08 Aug 2023 21:54:14 -0700


michaeljmarshall commented on code in PR #20748:
URL: https://github.com/apache/pulsar/pull/20748#discussion_r1287892334



##########
pip/pip-281.md:
##########
@@ -0,0 +1,268 @@
+<!--
+RULES
+* Never place a link to an external site like Google Doc. The proposal should 
be in this issue entirely.
+* Use a spelling and grammar checker tools if available for you (there are 
plenty of free ones).
+
+PROPOSAL HEALTH CHECK
+I can read the design document and understand the problem statement and what 
you plan to change *without* resorting to a couple of hours of code reading 
just to start having a high level understanding of the change.
+
+THIS COMMENTS
+Please remove them when done.
+-->
+
+# Background knowledge
+
+- Pulsar broker load balancer periodically unloads bundles from overloaded 
brokers. During this unload process, previous owner brokers close topic 
sessions(e.g. producers, subscriptions(consumers), managed ledgers). When 
re-assigned, new owner brokers recreate the topic sessions.
+
+- Pulsar clients request `CommandLookupTopic` to lookup or assign owner 
brokers for topics and connect to them.
+
+- PIP-192, the extensible load balancer introduced the bundle state channel 
that event-sources this unloading process in a state machine manner, from 
`releasing,` `assigned`, to `owned` state order. At `releasing,` the owner 
broker "releases" the bundle ownership(close topic sessions).
+
+- PIP-192, the extensible load balancer introduced TransferShedder, a new 
shedding strategy, which pre-assigns new owner brokers beforehand.
+
+
+# Motivation
+
+- When unloading closes many topic sessions, then many clients need to request 
CommandLookupTopic at the same time, which could cause many lookup requests on 
brokers. This unloading process can be further optimized if we can let the 
client directly connect to the new owner broker without following 
`CommandLookupTopic` requests.
+- In the new load balancer(pip-192), since the owner broker is already known, 
we can modify the close command protocol to pass the new destination broker URL 
and skip the lookup requests.
+- Also, when unloading, we can gracefully shutdown ledgers -- we always close 
old managed ledgers first and then recreate it on the new owner without 
conflicts.
+
+# Goals
+- Remove clients' lookup requests in the unload protocol to reduce the publish 
latency spike and e2e latency spike during
+unloading and also to resolve bottlenecks (of thundering lookups) when there 
are a large number of topics in a cluster.
+- Gracefully shutdown managed ledgers before new owners create them to reduce 
possible race-conditions between ledger close and ledger creations during 
unloading.
+
+## In Scope
+
+<!--
+What this PIP intend to achieve once It's integrated into Pulsar.
+Why does it benefit Pulsar.
+-->
+
+- This change will be added in the extensible load balancer.
+
+## Out of Scope
+
+<!--
+Describe what you have decided to keep out of scope, perhaps left for a 
different PIP/s.
+-->
+
+- This won't change the existing load balancer behavior(modular load manager).
+
+
+
+# High Level Design
+
+<!--
+Describe the design of your solution in *high level*.
+Describe the solution end to end, from a birds-eye view.
+Don't go into implementation details in this section.
+
+I should be able to finish reading from beginning of the PIP to here 
(including) and understand the feature and 
+how you intend to solve it, end to end.
+
+DON'T
+* Avoid code snippets, unless it's essential to explain your intent.
+-->
+
+To achieve the goals above, we could modify the bundle transfer protocol by 
the following.
+The proposed protocol change is based on the bundle states from PIP-192. 
+
+Basically, we could close the ledgers only in the releasing state and finally 
disconnect clients in the owned state with destination broker urls. The clients 
will directly connect to the pre-assigned destination broker url without 
lookups.  Meanwhile, during this transfer, any produced messages will be 
ignored by the source broker.
+
+Current Unload and Lookup Sequence in Extensible Load Balancer
+```mermaid
+sequenceDiagram
+    participant Clients
+    participant Owner Broker
+    participant New Owner Broker
+    participant Leader Broker
+    Leader Broker ->> Owner Broker: "state:Releasing:" close topic
+    Owner Broker ->> Owner Broker: close broker topic sessions
+    Owner Broker ->> Clients: close producers and consumers
+    Clients ->> Clients: reconnecting (inital delay 100ms)
+    Owner Broker ->> New Owner Broker: "state:Assign:" assign new ownership
+    New Owner Broker ->> Owner Broker: "state:Owned:" ack new ownership
+    Clients ->> Owner Broker: lookup
+    Owner Broker ->> Clients: redirect
+    Clients ->> New Owner Broker: lookup
+    New Owner Broker ->> Clients: return(connected)
+```
+
+Proposed Unload Sequence in Extensible Load Balancer without Lookup
+```mermaid
+sequenceDiagram
+    participant Clients
+    participant Owner Broker
+    participant New Owner Broker
+    participant Leader Broker
+    Leader Broker ->> Owner Broker: "state:Releasing:" close topic
+    Owner Broker ->> Owner Broker: close broker topic sessions(e.g ledgers) 
without disconnecting producers/consumers(fenced)
+    Clients -->> Owner Broker: message pubs are ignored

Review Comment:
   Why ignore messages? This is inefficient and allows for unnecessary data 
transfer, which in a cloud setting can be quite expensive. In your testing, how 
long does the state transition from `Assign` to `Owned` take? In another 
comment, I offer an alternative.



##########
pip/pip-281.md:
##########
@@ -0,0 +1,268 @@
+<!--
+RULES
+* Never place a link to an external site like Google Doc. The proposal should 
be in this issue entirely.
+* Use a spelling and grammar checker tools if available for you (there are 
plenty of free ones).
+
+PROPOSAL HEALTH CHECK
+I can read the design document and understand the problem statement and what 
you plan to change *without* resorting to a couple of hours of code reading 
just to start having a high level understanding of the change.
+
+THIS COMMENTS
+Please remove them when done.
+-->
+
+# Background knowledge
+
+- Pulsar broker load balancer periodically unloads bundles from overloaded 
brokers. During this unload process, previous owner brokers close topic 
sessions(e.g. producers, subscriptions(consumers), managed ledgers). When 
re-assigned, new owner brokers recreate the topic sessions.
+
+- Pulsar clients request `CommandLookupTopic` to lookup or assign owner 
brokers for topics and connect to them.
+
+- PIP-192, the extensible load balancer introduced the bundle state channel 
that event-sources this unloading process in a state machine manner, from 
`releasing,` `assigned`, to `owned` state order. At `releasing,` the owner 
broker "releases" the bundle ownership(close topic sessions).
+
+- PIP-192, the extensible load balancer introduced TransferShedder, a new 
shedding strategy, which pre-assigns new owner brokers beforehand.
+
+
+# Motivation
+
+- When unloading closes many topic sessions, then many clients need to request 
CommandLookupTopic at the same time, which could cause many lookup requests on 
brokers. This unloading process can be further optimized if we can let the 
client directly connect to the new owner broker without following 
`CommandLookupTopic` requests.
+- In the new load balancer(pip-192), since the owner broker is already known, 
we can modify the close command protocol to pass the new destination broker URL 
and skip the lookup requests.
+- Also, when unloading, we can gracefully shutdown ledgers -- we always close 
old managed ledgers first and then recreate it on the new owner without 
conflicts.
+
+# Goals
+- Remove clients' lookup requests in the unload protocol to reduce the publish 
latency spike and e2e latency spike during
+unloading and also to resolve bottlenecks (of thundering lookups) when there 
are a large number of topics in a cluster.
+- Gracefully shutdown managed ledgers before new owners create them to reduce 
possible race-conditions between ledger close and ledger creations during 
unloading.
+
+## In Scope
+
+<!--
+What this PIP intend to achieve once It's integrated into Pulsar.
+Why does it benefit Pulsar.
+-->
+
+- This change will be added in the extensible load balancer.
+
+## Out of Scope
+
+<!--
+Describe what you have decided to keep out of scope, perhaps left for a 
different PIP/s.
+-->
+
+- This won't change the existing load balancer behavior(modular load manager).
+
+
+
+# High Level Design
+
+<!--
+Describe the design of your solution in *high level*.
+Describe the solution end to end, from a birds-eye view.
+Don't go into implementation details in this section.
+
+I should be able to finish reading from beginning of the PIP to here 
(including) and understand the feature and 
+how you intend to solve it, end to end.
+
+DON'T
+* Avoid code snippets, unless it's essential to explain your intent.
+-->
+
+To achieve the goals above, we could modify the bundle transfer protocol by 
the following.
+The proposed protocol change is based on the bundle states from PIP-192. 
+
+Basically, we could close the ledgers only in the releasing state and finally 
disconnect clients in the owned state with destination broker urls. The clients 
will directly connect to the pre-assigned destination broker url without 
lookups.  Meanwhile, during this transfer, any produced messages will be 
ignored by the source broker.
+
+Current Unload and Lookup Sequence in Extensible Load Balancer
+```mermaid
+sequenceDiagram
+    participant Clients
+    participant Owner Broker
+    participant New Owner Broker
+    participant Leader Broker
+    Leader Broker ->> Owner Broker: "state:Releasing:" close topic
+    Owner Broker ->> Owner Broker: close broker topic sessions
+    Owner Broker ->> Clients: close producers and consumers
+    Clients ->> Clients: reconnecting (inital delay 100ms)
+    Owner Broker ->> New Owner Broker: "state:Assign:" assign new ownership
+    New Owner Broker ->> Owner Broker: "state:Owned:" ack new ownership
+    Clients ->> Owner Broker: lookup
+    Owner Broker ->> Clients: redirect
+    Clients ->> New Owner Broker: lookup
+    New Owner Broker ->> Clients: return(connected)
+```
+
+Proposed Unload Sequence in Extensible Load Balancer without Lookup
+```mermaid
+sequenceDiagram
+    participant Clients
+    participant Owner Broker
+    participant New Owner Broker
+    participant Leader Broker
+    Leader Broker ->> Owner Broker: "state:Releasing:" close topic
+    Owner Broker ->> Owner Broker: close broker topic sessions(e.g ledgers) 
without disconnecting producers/consumers(fenced)
+    Clients -->> Owner Broker: message pubs are ignored
+    Owner Broker ->> New Owner Broker: "state:Assign:" assign new ownership
+    New Owner Broker ->> Owner Broker: "state:Owned:" ack new ownership
+    Owner Broker ->> Owner Broker: close the fenced broker topic sessions
+    Owner Broker ->> Clients: close producers and consumers (with 
newOwnerBrokerUrl)
+    Clients ->> New Owner Broker: immediately connect

Review Comment:
   I am wondering if there are any opportunities to decrease the latency 
further by making some of these steps concurrent. I spent time thinking about 
this last year, but I never had the chance to do anything formal. In my view, 
the goal should be to decouple any steps that do not actually need to be done 
in order as long as we can gracefully recover without introducing too much 
complexity. Hopefully these thoughts are helpful. :)
   
   First, here is one of my core assumptions:
   
   Topic transfer due to load balancing is "voluntary" (or at the very least, 
does not need to be immediate) so the goal should be to minimize latency as 
much as possible. Therefore, doing something slightly inefficient on the server 
side or in the protocol to minimize the observed latency is preferable.
   
   Based on that assumption, I think the following things could help decrease 
transfer latency:
   
   1. Add a state to the `ServiceUnitState` enum that allows for a target 
broker to claim being the "next" broker for a topic without actually becoming 
the owner. Then, the next broker can do things like open the next ledger for 
the topic without actually being the "live" owner for the topic. This could be 
"wasteful" in certain failure scenarios, but that extra complexity is 
preferable to latency spikes.
   2. The new state added in step 1 would also give us the chance to tell 
clients to eagerly connect to the new broker before producers/consumers are 
disconnected so that the connection is ready to be hot swapped when the 
producers/consumers are closed by the broker.
   
   In thinking about potential flow, I see two alternatives to your proposed 
design.
   
   In this diagram, we would add a new protocol message that essentially tells 
a client to wait for the new broker url for the producer/consumer. In the event 
of failure (the connection closes), the client would simply do a normal lookup.
   
   ```mermaid
   sequenceDiagram
       participant Clients
       participant Owner Broker
       participant New Owner Broker
       participant Leader Broker
       Leader Broker ->> Owner Broker: "state:Releasing:" close topic
       Owner Broker ->> Owner Broker: close broker topic sessions
       Owner Broker ->> Clients: close producers and consumers (and indicate 
new broker command will come)
       Owner Broker ->> New Owner Broker: "state:Assign:" assign new ownership
       Owner Broker ->> Clients: tell clients to connect to new owner broker
       Clients ->> New Owner Broker: immediately connect
       New Owner Broker ->> Owner Broker: "state:Owned:" ack new ownership
       Owner Broker ->> Clients: tell clients to create producers/consumers
       Clients ->> New Owner Broker: create producers/consumers
   ```
   
   This second diagram implements both points from above. I think it is 
probably too complicated, though I do like the idea of really trying to remove 
all obstacles for latency that we possibly can.
   
   ```mermaid
   sequenceDiagram
       participant Clients
       participant Owner Broker
       participant New Owner Broker
       participant Leader Broker
       Leader Broker ->> Owner Broker: "state:Releasing:" close topic
       Owner Broker ->> New Owner Broker: "state:WeaklyAssign:" weakly assign 
new ownership
       New Owner Broker ->> New Owner Broker: asynchronously prepare to own 
topic
       New Owner Broker ->> Owner Broker: "state:WeaklyOwned:" ack new weak 
ownership
       Owner Broker ->> Clients: tell clients to connect to new owner broker 
(net new protocol message)
       Clients ->> New Owner Broker: immediately connect
       Owner Broker ->> Owner Broker: close broker topic sessions
       Owner Broker ->> Clients: close producers and consumers (and indicate 
new broker url)
       Owner Broker ->> New Owner Broker: "state:Assign:" assign new ownership
       New Owner Broker ->> Owner Broker: "state:Owned:" ack new ownership
       Owner Broker ->> Clients: tell clients to create producers/consumers 
(net new protocol command)
       Clients ->> New Owner Broker: create producers/consumers
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [pulsar] michaeljmarshall commented on a diff in pull request #20748: [pip][design] PIP-281: Optimize Bundle Unload(Transfer) Protocol for ExtensibleLoadManager

Reply via email to