Hi, please find my answers inline.

On Sun, Oct 23, 2022 at 7:11 PM PengHui Li <peng...@apache.org> wrote:

> Sorry, heesung,
>
> I think I used a confusing name "leader election".
> Actually, I meant to say "topic owner".
>
> From my understanding, the issue is if we are using the table view for a
> compacted topic.
> We will always get the last value of a key. But it will not work for the
> "broker ownership conflicts handling".
> First, we need to change the table view that is able to keep only the first
> value of a key.
> Even without compaction, the "broker ownership conflicts handling" will
> still work correctly, right?
>
>
Yes, the BSC conflict resolution needs to take the first valid value(state
change) per key, instead of just the latest value. For non-compacted topic,
only this table-view update(taking a strategic cache update) will serve its
purpose.


> But if the table view works on a compaction topic. The table view will show
> the last value of a key after the
> compaction. So you want also to change the topic compaction to make sure
> the table view will always show the
> first value of a key.
>
> Yes.


> Maybe I missed something here.
>
> My point is if we can just write the owner(final, the first value of the
> key) broker back to the topic.
> So that the table view will always show the first value of the key before
> the topic compaction or after the topic compaction.
>
>
But how do we conflict-resolve if the tail messages of the topic are
non-terminal states?

1. bundle 1 assigned by broker 1 // in the process of assignment
2. bundle 1 assigned by broker 2
3. bundle 2 released by broker 1 // in the process of transfer
3. bundle 2 assigned by broker 1
5. bundle 3 splitting by broker 1 // in the process of split
6. bundle 3 assigned by broker 2


Regards,
Heesung


> Thanks,
> Penghui
>
> On Sat, Oct 22, 2022 at 12:23 AM Heesung Sohn
> <heesung.s...@streamnative.io.invalid> wrote:
>
> > Hi Penghui,
> >
> > I put my answers inline.
> >
> > On Thu, Oct 20, 2022 at 5:11 PM PengHui Li <peng...@apache.org> wrote:
> >
> > > Hi Heesung.
> > >
> > > Is it possible to send the promoted value to the topic again to achieve
> > > eventual consistency?
> > >
> >
> > Yes, as long as the state change is valid, BSC will accept it and
> broadcast
> > it to all brokers.
> >
> >
> > > For example:
> > >
> > > We have 3 brokers, broker-a, broker-b, and broker-c
> > > The message for leader election could be "own: broker-b", "own:
> > broker-c",
> > > "own: broker-a"
> > > The broker-b will win in the end.
> > >
> > The broker-b can write a new message "own: broker-b" to the topic. After
> > > the topic compaction.
> > > Only the broker-b will be present in the topic. Does it work?
> >
> >
> > The proposal does not use a topic for leader election because of the
> > circular dependency. The proposal uses the metadata store, zookeeper, to
> > elect the leader broker(s) of BSC.
> > This part is explained in the "Bundle State Channel Owner Selection and
> > Discovery" section in pip-192.
> >
> > *Bundle State Channel Owner Selection and Discovery*
> >
> > *Bundle State Channel(BSC) is another topic, and because of its circular
> > dependency, we can't use the BundleStateChannel to find the owner broker
> of
> > the BSC topic. For example, when a cluster starts, each broker needs to
> > initiate BSC TopicLookUp(to find the owner broker) in order to consume
> the
> > messages in BSC. However, initially, each broker does not know which
> broker
> > owns the BSC.*
> >
> > *The ZK leader election can be a good option to break this circular
> > dependency, like the followings.*
> > *Channel Owner Selection*
> >
> > *The cluster can use the ZK leader election to select the owner broker.
> If
> > the owner becomes unavailable, one of the followers will become the new
> > owner. We can elect the owner for each bundle state channel partition.*
> > *Channel Owner Discovery*
> >
> > *Then, in brokers’ TopicLookUp logic, we will add a special case to
> return
> > the current leader(the elected BSC owner) for the BSC topics.*
> >
> >
> >
> > >
> > > Maybe I missed something.
> > >
> > > Thanks,
> > > Penghui
> > >
> > > On Thu, Oct 20, 2022 at 1:30 AM Heesung Sohn
> > > <heesung.s...@streamnative.io.invalid> wrote:
> > >
> > > > Oops.
> > > > I forgot to mention another important item. I added it below(in
> bold).
> > > >
> > > > Pros:
> > > > - It supports more distributed load balance operations(bundle
> > assignment)
> > > > in a sequentially consistent manner
> > > > - For really large clusters, by a partitioned system topic, BSC can
> be
> > > more
> > > > scalable than the current single-leader coordination solution.
> > > > - The load balance commands(across brokers) are sent via event
> > > > sourcing(more reliable/transparent/easy-to-track) instead of RPC with
> > > > retries.
> > > > *- Bundle ownerships can be cached in the topic table-view from BSC.
> > (no
> > > > longer needs to store bundle ownership in metadata store(ZK))*
> > > >
> > > > Cons:
> > > > - It is a new implementation and will require significant effort to
> > > > stabilize the new implementation.
> > > > (Based on our PoC code, I think the event sourcing handlers are
> easier
> > to
> > > > understand and follow the logic.
> > > > Also, this new load balancer will be pluggable(will be implemented in
> > new
> > > > classes), so it should not break the existing load balance logic.
> > > > Users will be able to configure old/new broker load balancer.)
> > > >
> > > > On Wed, Oct 19, 2022 at 10:17 AM Heesung Sohn <
> > > > heesung.s...@streamnative.io>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > On Wed, Oct 19, 2022 at 2:06 AM 丛搏 <congbobo...@gmail.com> wrote:
> > > > >
> > > > >> Hi, Heesung:
> > > > >> I have some doubts.
> > > > >> I review your PIP-192: New Pulsar Broker Load Balancer. I found
> that
> > > > >> unload topic uses the leader broker to do, (Assigning, Return)
> uses
> > > > >> the lookup request broker. why (Assigning, Return) not use a
> leader
> > > > >> broker?
> > > > >> I can think of a few reasons:
> > > > >> 1. reduce leader broker pressure
> > > > >> 2. does not strongly depend on the leader broker
> > > > >>
> > > > >> Yes, one of the goals of the PIP-192 is to distribute the load
> > balance
> > > > > logic to individual brokers (bundle assignment and bundle split).
> > > > >
> > > > > If (Assigning, Return) does not depend on the leader, it will bring
> > > > > the following problems:
> > > > >
> > > > >> If (Assigning, Return) does not depend on the leader, it will
> bring
> > > > >> the following problems:
> > > > >
> > > > > I assume what you meant by `(Assigning, Return) does not depend on
> > the
> > > > > leader` is the distributed topic assignment here(concurrent bundle
> > > > > assignment across brokers).
> > > > >
> > > > > 1. leader clear bundle op and (Assigning, Return) will do at the
> same
> > > > >> time, It will cause many requests to be retried, and the broker
> will
> > > > >> be in chaos for a long time.
> > > > >
> > > > > I assume `leader clear bundle op` means bundle unloading, and `many
> > > > > requests` means topic lookup requests(bundle assignment requests).
> > > > > The leader unloads only high-loaded bundles in the "Owned" state.
> So,
> > > the
> > > > > leader does not unload bundles that are in the process of
> assignment
> > > > states.
> > > > > Even if there are conflict state changes, only the first valid
> state
> > > > > change will be accepted(as explained in Conflict State
> > Resolution(Race
> > > > > Conditions section in the PIP)) in BSC.
> > > > >
> > > > > Also, another goal of this PIP-192 is to reduce client lookup
> > retries.
> > > In
> > > > > BSC, the client lookup response will be deferred(max x secs) until
> > the
> > > > > bundle state becomes finally "Owned".
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >> 2. bundle State Channel(BSC) owner depends on the leader broker,
> > this
> > > > >> also makes topic transfer strongly dependent on the leader.
> > > > >>
> > > > > BSC will use separate leader znodes to decide the owner brokers of
> > the
> > > > > internal BSC system topic.As described in this section in the
> > PIP-192,
> > > > > "Bundle State and Load Data TableView Scalability",
> > > > > We could use a partitioned topic(configurable) for this BSC system
> > > topic.
> > > > > Then, there could be a separate owner broker for each partition
> > > > > (e.g. zk leader znodes, /loadbalance/leader/part-1-owner,
> > part-2-owner,
> > > > > ..etc).
> > > > >
> > > > >
> > > > >
> > > > >> 3. the code becomes more complex and harder to maintain
> > > > >>
> > > > >> What tradeoffs are the current implementations based on?
> > > > >>
> > > > >> Here are some Pros and Cons of BSC I can think of.
> > > > >
> > > > > Pros:
> > > > > - It supports more distributed load balance operations(bundle
> > > assignment)
> > > > > in a sequentially consistent manner
> > > > > - For really large clusters, by a partitioned system topic, BSC can
> > be
> > > > > more scalable than the current single-leader coordination solution.
> > > > > - The load balance commands(across brokers) are sent via event
> > > > > sourcing(more reliable/transparent/easy-to-track) instead of RPC
> with
> > > > > retries.
> > > > >
> > > > > Cons:
> > > > > - It is a new implementation and will require significant effort to
> > > > > stabilize the new implementation.
> > > > > (Based on our PoC code, I think the event sourcing handlers are
> > easier
> > > to
> > > > > understand and follow the logic.
> > > > > Also, this new load balancer will be pluggable(will be implemented
> in
> > > new
> > > > > classes), so it should not break the existing load balance logic.
> > > > > Users will be able to configure old/new broker load balancer.)
> > > > >
> > > > >
> > > > > Thank you for sharing your questions about PIP-192 here. But I
> think
> > > this
> > > > > PIP-215 is independent of PIP-192(though PIP-192 needs some of the
> > > > features
> > > > > in PIP-215).
> > > > >
> > > > > Thanks,
> > > > > Heesung
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >> Thanks,
> > > > >> bo
> > > > >>
> > > > >> Heesung Sohn <heesung.s...@streamnative.io.invalid>
> 于2022年10月19日周三
> > > > >> 07:54写道:
> > > > >> >
> > > > >> > Hi pulsar-dev community,
> > > > >> >
> > > > >> > I raised a pip to discuss : PIP-215: Configurable Topic
> Compaction
> > > > >> Strategy
> > > > >> >
> > > > >> > PIP link: https://github.com/apache/pulsar/issues/18099
> > > > >> >
> > > > >> > Regards,
> > > > >> > Heesung
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to