Hi, please find my answers inline. On Sun, Oct 23, 2022 at 7:11 PM PengHui Li <peng...@apache.org> wrote:
> Sorry, heesung, > > I think I used a confusing name "leader election". > Actually, I meant to say "topic owner". > > From my understanding, the issue is if we are using the table view for a > compacted topic. > We will always get the last value of a key. But it will not work for the > "broker ownership conflicts handling". > First, we need to change the table view that is able to keep only the first > value of a key. > Even without compaction, the "broker ownership conflicts handling" will > still work correctly, right? > > Yes, the BSC conflict resolution needs to take the first valid value(state change) per key, instead of just the latest value. For non-compacted topic, only this table-view update(taking a strategic cache update) will serve its purpose. > But if the table view works on a compaction topic. The table view will show > the last value of a key after the > compaction. So you want also to change the topic compaction to make sure > the table view will always show the > first value of a key. > > Yes. > Maybe I missed something here. > > My point is if we can just write the owner(final, the first value of the > key) broker back to the topic. > So that the table view will always show the first value of the key before > the topic compaction or after the topic compaction. > > But how do we conflict-resolve if the tail messages of the topic are non-terminal states? 1. bundle 1 assigned by broker 1 // in the process of assignment 2. bundle 1 assigned by broker 2 3. bundle 2 released by broker 1 // in the process of transfer 3. bundle 2 assigned by broker 1 5. bundle 3 splitting by broker 1 // in the process of split 6. bundle 3 assigned by broker 2 Regards, Heesung > Thanks, > Penghui > > On Sat, Oct 22, 2022 at 12:23 AM Heesung Sohn > <heesung.s...@streamnative.io.invalid> wrote: > > > Hi Penghui, > > > > I put my answers inline. > > > > On Thu, Oct 20, 2022 at 5:11 PM PengHui Li <peng...@apache.org> wrote: > > > > > Hi Heesung. > > > > > > Is it possible to send the promoted value to the topic again to achieve > > > eventual consistency? > > > > > > > Yes, as long as the state change is valid, BSC will accept it and > broadcast > > it to all brokers. > > > > > > > For example: > > > > > > We have 3 brokers, broker-a, broker-b, and broker-c > > > The message for leader election could be "own: broker-b", "own: > > broker-c", > > > "own: broker-a" > > > The broker-b will win in the end. > > > > > The broker-b can write a new message "own: broker-b" to the topic. After > > > the topic compaction. > > > Only the broker-b will be present in the topic. Does it work? > > > > > > The proposal does not use a topic for leader election because of the > > circular dependency. The proposal uses the metadata store, zookeeper, to > > elect the leader broker(s) of BSC. > > This part is explained in the "Bundle State Channel Owner Selection and > > Discovery" section in pip-192. > > > > *Bundle State Channel Owner Selection and Discovery* > > > > *Bundle State Channel(BSC) is another topic, and because of its circular > > dependency, we can't use the BundleStateChannel to find the owner broker > of > > the BSC topic. For example, when a cluster starts, each broker needs to > > initiate BSC TopicLookUp(to find the owner broker) in order to consume > the > > messages in BSC. However, initially, each broker does not know which > broker > > owns the BSC.* > > > > *The ZK leader election can be a good option to break this circular > > dependency, like the followings.* > > *Channel Owner Selection* > > > > *The cluster can use the ZK leader election to select the owner broker. > If > > the owner becomes unavailable, one of the followers will become the new > > owner. We can elect the owner for each bundle state channel partition.* > > *Channel Owner Discovery* > > > > *Then, in brokers’ TopicLookUp logic, we will add a special case to > return > > the current leader(the elected BSC owner) for the BSC topics.* > > > > > > > > > > > > Maybe I missed something. > > > > > > Thanks, > > > Penghui > > > > > > On Thu, Oct 20, 2022 at 1:30 AM Heesung Sohn > > > <heesung.s...@streamnative.io.invalid> wrote: > > > > > > > Oops. > > > > I forgot to mention another important item. I added it below(in > bold). > > > > > > > > Pros: > > > > - It supports more distributed load balance operations(bundle > > assignment) > > > > in a sequentially consistent manner > > > > - For really large clusters, by a partitioned system topic, BSC can > be > > > more > > > > scalable than the current single-leader coordination solution. > > > > - The load balance commands(across brokers) are sent via event > > > > sourcing(more reliable/transparent/easy-to-track) instead of RPC with > > > > retries. > > > > *- Bundle ownerships can be cached in the topic table-view from BSC. > > (no > > > > longer needs to store bundle ownership in metadata store(ZK))* > > > > > > > > Cons: > > > > - It is a new implementation and will require significant effort to > > > > stabilize the new implementation. > > > > (Based on our PoC code, I think the event sourcing handlers are > easier > > to > > > > understand and follow the logic. > > > > Also, this new load balancer will be pluggable(will be implemented in > > new > > > > classes), so it should not break the existing load balance logic. > > > > Users will be able to configure old/new broker load balancer.) > > > > > > > > On Wed, Oct 19, 2022 at 10:17 AM Heesung Sohn < > > > > heesung.s...@streamnative.io> > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > On Wed, Oct 19, 2022 at 2:06 AM 丛搏 <congbobo...@gmail.com> wrote: > > > > > > > > > >> Hi, Heesung: > > > > >> I have some doubts. > > > > >> I review your PIP-192: New Pulsar Broker Load Balancer. I found > that > > > > >> unload topic uses the leader broker to do, (Assigning, Return) > uses > > > > >> the lookup request broker. why (Assigning, Return) not use a > leader > > > > >> broker? > > > > >> I can think of a few reasons: > > > > >> 1. reduce leader broker pressure > > > > >> 2. does not strongly depend on the leader broker > > > > >> > > > > >> Yes, one of the goals of the PIP-192 is to distribute the load > > balance > > > > > logic to individual brokers (bundle assignment and bundle split). > > > > > > > > > > If (Assigning, Return) does not depend on the leader, it will bring > > > > > the following problems: > > > > > > > > > >> If (Assigning, Return) does not depend on the leader, it will > bring > > > > >> the following problems: > > > > > > > > > > I assume what you meant by `(Assigning, Return) does not depend on > > the > > > > > leader` is the distributed topic assignment here(concurrent bundle > > > > > assignment across brokers). > > > > > > > > > > 1. leader clear bundle op and (Assigning, Return) will do at the > same > > > > >> time, It will cause many requests to be retried, and the broker > will > > > > >> be in chaos for a long time. > > > > > > > > > > I assume `leader clear bundle op` means bundle unloading, and `many > > > > > requests` means topic lookup requests(bundle assignment requests). > > > > > The leader unloads only high-loaded bundles in the "Owned" state. > So, > > > the > > > > > leader does not unload bundles that are in the process of > assignment > > > > states. > > > > > Even if there are conflict state changes, only the first valid > state > > > > > change will be accepted(as explained in Conflict State > > Resolution(Race > > > > > Conditions section in the PIP)) in BSC. > > > > > > > > > > Also, another goal of this PIP-192 is to reduce client lookup > > retries. > > > In > > > > > BSC, the client lookup response will be deferred(max x secs) until > > the > > > > > bundle state becomes finally "Owned". > > > > > > > > > > > > > > > > > > > > > > > > >> 2. bundle State Channel(BSC) owner depends on the leader broker, > > this > > > > >> also makes topic transfer strongly dependent on the leader. > > > > >> > > > > > BSC will use separate leader znodes to decide the owner brokers of > > the > > > > > internal BSC system topic.As described in this section in the > > PIP-192, > > > > > "Bundle State and Load Data TableView Scalability", > > > > > We could use a partitioned topic(configurable) for this BSC system > > > topic. > > > > > Then, there could be a separate owner broker for each partition > > > > > (e.g. zk leader znodes, /loadbalance/leader/part-1-owner, > > part-2-owner, > > > > > ..etc). > > > > > > > > > > > > > > > > > > > >> 3. the code becomes more complex and harder to maintain > > > > >> > > > > >> What tradeoffs are the current implementations based on? > > > > >> > > > > >> Here are some Pros and Cons of BSC I can think of. > > > > > > > > > > Pros: > > > > > - It supports more distributed load balance operations(bundle > > > assignment) > > > > > in a sequentially consistent manner > > > > > - For really large clusters, by a partitioned system topic, BSC can > > be > > > > > more scalable than the current single-leader coordination solution. > > > > > - The load balance commands(across brokers) are sent via event > > > > > sourcing(more reliable/transparent/easy-to-track) instead of RPC > with > > > > > retries. > > > > > > > > > > Cons: > > > > > - It is a new implementation and will require significant effort to > > > > > stabilize the new implementation. > > > > > (Based on our PoC code, I think the event sourcing handlers are > > easier > > > to > > > > > understand and follow the logic. > > > > > Also, this new load balancer will be pluggable(will be implemented > in > > > new > > > > > classes), so it should not break the existing load balance logic. > > > > > Users will be able to configure old/new broker load balancer.) > > > > > > > > > > > > > > > Thank you for sharing your questions about PIP-192 here. But I > think > > > this > > > > > PIP-215 is independent of PIP-192(though PIP-192 needs some of the > > > > features > > > > > in PIP-215). > > > > > > > > > > Thanks, > > > > > Heesung > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Thanks, > > > > >> bo > > > > >> > > > > >> Heesung Sohn <heesung.s...@streamnative.io.invalid> > 于2022年10月19日周三 > > > > >> 07:54写道: > > > > >> > > > > > >> > Hi pulsar-dev community, > > > > >> > > > > > >> > I raised a pip to discuss : PIP-215: Configurable Topic > Compaction > > > > >> Strategy > > > > >> > > > > > >> > PIP link: https://github.com/apache/pulsar/issues/18099 > > > > >> > > > > > >> > Regards, > > > > >> > Heesung > > > > >> > > > > > > > > > > > > > > >