On Tue, Aug 8, 2023 at 9:56 PM Michael Marshall <mmarsh...@apache.org> wrote:
> Thanks for your proposal, Heesung. Sorry for my late response. Just > want to make a few points to hopefully improve the implementation. > Overall, I think this feature is absolutely the right direction for > Pulsar and will be a great improvement. I remember discussing this at > some community meetings last year. See the last paragraph of [0]. > > In general, my primary question is about the protocol changes you're > proposing, both the commands and the way the commands are sent. > > > I am worried about the backward compatibility if we add a new proto > message > > -- how can the old versions of clients handle this new proto message? > > The protocolVersion [1] object in the proto definition is how we > guarantee backwards and forwards compatibility for all combinations of > brokers/clients. The client and the broker each share their version > during the handshake and then they only use commands known to both > peers. > > We can introduce a new command here if we want/need. The nuance is > that the server logic will branch depending on the client's > protocolVersion. > Good Point. We can introduce new commands here instead of the optional fields by branching the protocolVersion. If the community thinks the new commands are cleaner, I can create new commands for this, as the protocolVersion guarantees the backward-compatbility. However, with this optional field approach, we don't need to introduce a new protocol version, which could be arguably beneficial. I will leave the decision to the community. Let me know if I have to use new commands and increase the protocolVersion for this change. > Next, I am wondering if there are any opportunities to decrease the > latency even more than are proposed in the PIP by introducing some > concurrency to decouple operations like creating new connections and > initializing resources used by each topic (like a managed ledger). > Because I used a mermaid diagram, I put the majority of this point in > a PR comment [2]. > I will be replying to the PR. > Thanks, > Michael > > [0] https://lists.apache.org/thread/ghlsprmqgkt8lfv634groy93c5dm1k1g > [1] > https://github.com/apache/pulsar/blob/15453964f6e6086b0c5e34736958cf749f16f5e1/pulsar-common/src/main/proto/PulsarApi.proto#L242-L268 > [2] https://github.com/apache/pulsar/pull/20748#discussion_r1287927655 > > On Fri, Aug 4, 2023 at 3:41 PM Heesung Sohn > <heesung.s...@streamnative.io.invalid> wrote: > > > > Hi, > > > > I updated the PIP to add more details about the recent discussion. > > > > Thanks, > > Heesung > > > > On Thu, Aug 3, 2023 at 6:46 PM Heesung Sohn < > heesung.s...@streamnative.io> > > wrote: > > > > > > > > > > > On Wed, Aug 2, 2023 at 8:02 PM PengHui Li <peng...@apache.org> wrote: > > > > > >> > One of the advantages of this proposal (add these optional > dstBrokerUrl > > >> fields in the existing CommandCloseProducer and CommandCloseConsumer) > is > > >> the backward compatibility > > >> > > >> Ah, yes. I missed this point. It's a good convince. > > >> > > >> > After the client connects to the destination broker, the next > > >> command(e.g. > > >> > ProducerCommand) requires > > >> > the destination broker to check the ownership again against its > local > > >> table > > >> > view of the bundle state channel. > > >> > > >> > Upon this local ownership check, there could be the following > scenarios: > > >> > > >> > Happy case: > > >> > - If the ownership state is `owned ` and the current broker is > indeed > > >> the > > >> > owner, the command completes. > > >> > Unhappy cases: > > >> > - If the ownership state is `owned ` and the current broker is not > the > > >> > owner, the command fails > > >> > (the broker returns an error to the client), and the client tries to > > >> find > > >> > the true new owner by lookups. > > >> > The global bundle state channel is eventually consistent, and the > > >> lookups > > >> > should eventually converge. > > >> > - if the ownership change is still in progress(`releasing`, > > >> `assigning`), > > >> > this check will be deferred > > >> > until the state becomes `owned` with a timeout. > > >> > > >> Could you please also update the explanation in the proposal? > > >> It looks like what is guaranteed and what is not in the new proposal. > > >> And why the mechanism can work with the unhappy case. > > >> > > >> Both the existing mechanism and the new mechanism follow the > > >> eventual consistency. The nuance is that the new mechanism will change > > >> the bundle owner first and then unload the topics. For the broker > logs, > > >> will more error messages related to the fenced ledger be introduced? > > >> Or the client reconnects will happen after the owner cache is updated? > > >> > > >> > > > As described in the protocol change in the PIP, the goal is to have > fewer > > > fenced ledger errors because the protocol change helps the client > > > reconnection(open ledger) > > > happen at the very end. > > > > > > > > >> If the load manager crashed before unloading the topics (close > producers > > >> and consumers). > > >> Will the old broker provide service for this topic until the client > > >> triggers a > > >> new lookup? > > >> > > > > > > Sorry, I am not sure if I understood your question here, but > > > When the destination or source broker crashes in the middle, > > > the leader will find the orphan state and clean the ownership by > selecting > > > a new owner, and the client will reconnect to it. > > > During this transfer process, if alive, the source broker will serve > the > > > topic according to the protocol described in the PIP. > > > > > > > > >> Sorry, Another question about PIP-192. Is it required by the new > > >> mechanism? > > >> Will the situation also be improved without enabling PIP-192? > > >> We have added the new owner broker to the close producer and consumer > > >> command. > > >> We can also update the owner and then close the producer and consumers > > >> with > > >> the new owner. > > >> > > > > > > This PIP is based on PIP-192 because the proposed protocol change is > based > > > on the bundle states from PIP-192. > > > I assume the community could raise other PIPs to utilize the close > > > producer and consumer command changes for other logic too. > > > > > > > > >> > > >> Regards, > > >> Penghui > > >> > > >> On Thu, Aug 3, 2023 at 4:38 AM Heesung Sohn > > >> <heesung.s...@streamnative.io.invalid> wrote: > > >> > > >> > Hi PengHui, > > >> > Thank you for your questions. Please find my answers inline. > > >> > > > >> > On Wed, Aug 2, 2023 at 3:47 AM PengHui Li <peng...@apache.org> > wrote: > > >> > > > >> > > Hi Heesung, > > >> > > > > >> > > I'm sorry for not getting back to you sooner. > > >> > > The motivation and the solution look good to me. > > >> > > > > >> > > I just want to leave some comments to make the proposal clear > > >> > > for users and developers. > > >> > > > > >> > > > > >> > > > > >> > > > >> > ---------------------------------------------------------------------------------------------------- > > >> > > > > >> > > We should have a section to describe the values that will be added > > >> > > by this proposal. > > >> > > > > >> > > - Reduced the publish latency spike and e2e latency spike during > the > > >> > > reconnecting > > >> > > - Mitigated bottleneck(lookup) of a large number of topics in a > > >> cluster > > >> > > > > >> > > Maybe more, I can only think about the above two values. > > >> > > > > >> > > > > >> > Good point. I will clarify this part in the PIP. > > >> > > > >> > > > >> > > > >> > ---------------------------------------------------------------------------------------------------- > > >> > > > > >> > > For the proto changes, the client side can also use the HTTP > lookup > > >> > > service. > > >> > > We should also mention it in this proposal. Instead of adding > four new > > >> > > fields to > > >> > > the CommandCloseProducer and CommandCloseConsumer, maybe adding a > new > > >> > > message BrokerIdentifier, or others is better. > > >> > > > > >> > > > >> > I am worried about the backward compatibility if we add a new proto > > >> message > > >> > -- how can the old versions of clients handle this new proto > message? > > >> > > > >> > One of the advantages of this proposal (add these optional > dstBrokerUrl > > >> > fields in the existing CommandCloseProducer and > CommandCloseConsumer) is > > >> > the backward compatibility -- older clients can still work in the > older > > >> way > > >> > by going through the lookup requests. > > >> > > > >> > Ref: https://protobuf.dev/overview/#updating-defs > > >> > Updating Proto Definitions Without Updating Code > > >> > <https://protobuf.dev/overview/#updating-defs> > > >> > > > >> > It’s standard for software products to be backward compatible, but > it is > > >> > less common for them to be forward compatible. As long as you follow > > >> > some simple > > >> > practices <https://protobuf.dev/programming-guides/proto3/#updating > > > > >> when > > >> > updating .proto definitions, old code will read new messages without > > >> > issues, ignoring any newly added fields. ... > > >> > > > >> > > > >> > > > > >> > > > > >> > > > >> > ---------------------------------------------------------------------------------------------------- > > >> > > > > >> > > There are two points we need to clarify in the proposal. > > >> > > > > >> > > - If the client reconnects before the new owner takes ownership. > If > > >> the > > >> > > producer closed > > >> > > after the new broker took ownership, would it be a potential > > >> > consistency > > >> > > issue? > > >> > > > >> > - If the owner changed again before the client reconnection. How did > > >> Pulsar > > >> > > handle this case? > > >> > > As I understand, it's fine because the broker will check the > > >> ownership > > >> > > again when handling > > >> > > producer and consumer creation. If it can be explained in the > > >> > proposal, > > >> > > it should be great > > >> > > for developers to understand it deeply. > > >> > > > > >> > > > >> > After the client connects to the destination broker, the next > > >> command(e.g. > > >> > ProducerCommand) requires > > >> > the destination broker to check the ownership again against its > local > > >> table > > >> > view of the bundle state channel. > > >> > > > >> > Upon this local ownership check, there could be the following > scenarios: > > >> > > > >> > Happy case: > > >> > - If the ownership state is `owned ` and the current broker is > indeed > > >> the > > >> > owner, the command completes. > > >> > Unhappy cases: > > >> > - If the ownership state is `owned ` and the current broker is not > the > > >> > owner, the command fails > > >> > (the broker returns an error to the client), and the client tries to > > >> find > > >> > the true new owner by lookups. > > >> > The global bundle state channel is eventually consistent, and the > > >> lookups > > >> > should eventually converge. > > >> > - if the ownership change is still in progress(`releasing`, > > >> `assigning`), > > >> > this check will be deferred > > >> > until the state becomes `owned` with a timeout. > > >> > > > >> > > > >> > > Regards, > > >> > > Penghui > > >> > > > > >> > > On Wed, Aug 2, 2023 at 1:18 AM Heesung Sohn > > >> > > <heesung.s...@streamnative.io.invalid> wrote: > > >> > > > > >> > > > Hi, > > >> > > > > > >> > > > It has been over three weeks since this discussion started, and > I > > >> > replied > > >> > > > to the questions in the PIP. > > >> > > > I will send out a vote email tomorrow if looking good. > > >> > > > > > >> > > > Thanks, > > >> > > > Heesung > > >> > > > > > >> > > > On Tue, Jul 25, 2023 at 1:12 PM Heesung Sohn < > > >> > > heesung.s...@streamnative.io > > >> > > > > > > >> > > > wrote: > > >> > > > > > >> > > > > Hi, > > >> > > > > > > >> > > > > I replied to the comments from Kai and Ran in the PIP. > > >> > > > > > > >> > > > > Please review the PIP by any chance. > > >> > > > > > > >> > > > > Thank you, > > >> > > > > Heesung > > >> > > > > > > >> > > > > On Thu, Jul 6, 2023 at 4:32 PM Heesung Sohn < > > >> > > > heesung.s...@streamnative.io> > > >> > > > > wrote: > > >> > > > > > > >> > > > >> Hi dev, > > >> > > > >> > > >> > > > >> I proposed this PIP, > https://github.com/apache/pulsar/pull/20748 > > >> , > > >> > to > > >> > > > >> make unloaded clients directly and gracefully connect to new > > >> owner > > >> > > > brokers > > >> > > > >> without lookups. > > >> > > > >> > > >> > > > >> Please take a look and let me know what you think. > > >> > > > >> > > >> > > > >> Regards, > > >> > > > >> Heesung > > >> > > > >> > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > >