heesung-sn commented on PR #20748: URL: https://github.com/apache/pulsar/pull/20748#issuecomment-1691100334
> Another potential issue with ignoring protocol messages from clients during unloading is that it might increase the number of duplicate messages sent to consumers because ACK protocol messages will be sent to the source broker but not persisted to the consumer's ledger. Upon releasing state, `close broker topic sessions(e.g ledgers) without disconnecting producers/consumers(fenced)` action should block msg dispatchers on the source broker as the ledger will be closed. > My primary concerns about the pausing are in cases when the system is already degraded. Yes. The additional complexity of sending the `prepare` commands to all clients on the overloaded source broker could add pressure on top of it. I wonder if we need to consider the `prepare` command idea as a separate PIP on top of this proposal if we receive strong demand. On Wed, Aug 23, 2023 at 9:53 PM Michael Marshall ***@***.***> wrote: > ***@***.**** commented on this pull request. > ------------------------------ > > In pip/pip-281.md > <https://github.com/apache/pulsar/pull/20748#discussion_r1303801386>: > > > +sequenceDiagram > + participant Clients > + participant Owner Broker > + participant New Owner Broker > + participant Leader Broker > + Leader Broker ->> Owner Broker: "state:Releasing:" close topic > + Owner Broker ->> Owner Broker: close broker topic sessions(e.g ledgers) without disconnecting producers/consumers(fenced) > + Clients -->> Owner Broker: message pubs are ignored > + Owner Broker ->> New Owner Broker: "state:Assign:" assign new ownership > + New Owner Broker ->> Owner Broker: "state:Owned:" ack new ownership > + Owner Broker ->> Owner Broker: close the fenced broker topic sessions > + Owner Broker ->> Clients: close producers and consumers (with newOwnerBrokerUrl) > + Clients ->> New Owner Broker: immediately connect > > Still any inflight messages need to be ignored at source brokers. > > I agree that there will still be a window of messages that will be > received but not persisted. That is already part of the protocol. > > > - It needs brokers to additionally send the prepare command to the > clients, adding client-broker protocol complexity. > > I really think the protocol is designed to handle this complexity. The > resulting code might have multiple conditionals, but lifecycle changes are > complex, and decoupling events (telling a producer to stop sending messages > and telling producer which broker to connect to) seems like a more robust > design, in my opinion. > > Another potential issue with ignoring protocol messages from clients > during unloading is that it might increase the number of duplicate messages > sent to consumers because ACK protocol messages will be sent to the > source broker but not persisted to the consumer's ledger. This is always a > possibility (we work with at least once delivery after all), but our goal > is to make the protocol work in a way that minimizes these kinds of dupes. > The current protocol disconnects a consumer and the consumer which triggers > behaviors that should decrease dupes (though I haven't looked at the source > code in a while). This new protocol leaves the gap open a little longer. > > My primary concerns about the pausing are in cases when the system is > already degraded. It's possible that the latency is so negligible in most > cases that the dupes and extra network usage are not a problem. > > > - It adds the complexity of adding paused states in the clients. We > have java, c++, golang code to maintain, so this will increase the scope of > the work and complexity. > > These clients already handle the "pausing" when they get disconnected from > one broker and reconnect to the next. As such, I think they should all > already have this state built in. There will still need to be additional > logic to handle the new protocol message or protocol field, but the states > should be represented there already. > > I believe this "prepare" command can be added in the next > iteration(probably optimization phase 2), too, if we see the strong need > here. > > This is a valid perspective. My primary counterpoint is that having a > phase two implementation would mean three hand-off mechanisms, which > increases complexity more than switching to the "prepare" solution now. > > — > Reply to this email directly, view it on GitHub > <https://github.com/apache/pulsar/pull/20748#discussion_r1303801386>, or > unsubscribe > <https://github.com/notifications/unsubscribe-auth/AYVJ674AUPVC4HDQJ2LNY3TXW3M4PANCNFSM6AAAAAA2BCTKQY> > . > You are receiving this because you were mentioned.Message ID: > ***@***.***> > -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
