Overall looks good. Can you please convert this to a PIP and push a PR for it? Then I think we can go to implement.
-- Yong On Mon, 26 Jun 2023 at 15:23, Enrico Olivelli <eolive...@gmail.com> wrote: > Il giorno lun 26 giu 2023 alle ore 09:21 Yubiao Feng > <yubiao.f...@streamnative.io.invalid> ha scritto: > > > > Hi Yan,Asaf > > > > > I want to add only one step to your plan. > > > If you introduce this flag in Y.X, then in Y.(X+1), > > > let's remove this flag > > > and keep the "true" value as the behavior. > > > > I agree with Asaf > +1 > > Enrico > > > > > Thanks > > Yubiao Feng > > > > On Mon, Jun 19, 2023 at 9:57 AM horizonzy <horizo...@apache.org> wrote: > > > > > Background > > > > > > In the Pulsar, it has two features: > > > > > > - > > > > > > The first feature allows users to set group and rack information for > > > bookies using pulsar-admin bookies set-bookie-rack. > > > > > > Here, users set bookie1 to bookie5 to the default group and bookie6 to > > > bookie10 to the share group using commands, they don't care about rack > > > information, they only care about which group the bookie belongs to. > > > > > > default={bookie1:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null), bookie2:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null), bookie3:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null), bookie4:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null), bookie5:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null)} > > > > > > _shared_={bookie6:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null), bookie7:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null), bookie8:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null), bookie9:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null), bookie10:3181=BookieInfoImpl(rack=default-rack, > > > hostname=null)} > > > > > > > > > - > > > > > > The second feature allows users to set the priority of traffic for a > > > namespace, where traffic is directed to the primary group first and > > > then to > > > the secondary group. Users can set this priority using pulsar-admin > > > ns-isolation-policy set --namespaces public/default --primary > "group" > > > --secondary "group". > > > > > > Here, users set the primary group of the /public/default namespace to > > > "share" using a command. > > > > > > { > > > "bookkeeperAffinityGroupPrimary" : "share" > > > } > > > > > > After this work is completed, all traffic under the /public/default > > > namespace will be directed to bookie6-10 in the "share" group. > > > > > > Drawbacks > > > > > > After a period of time, users added some new bookies [bk11, bk12, bk13, > > > bk14, bk15] to the bookie cluster, they found that some traffic under > the > > > /public/default namespace was directed to the newly added machines. > After > > > investigation, we eventually found that this was a defect in the > working > > > mechanism of bookkeeperAffinityGroupPrimary. > > > > > > *bookkeeperAffinityGroupPrimary work mechanism* > > > > > > All bookies in the cluster: bk1-bk15. > > > > > > Here are the steps of the broker pick bookies. > > > > > > 1. > > > > > > Get the bookie rack info config default: [bk1, bk2, bk3, bk4, bk5]; > > > share: > > > [bk6, bk7, bk8, bk9, bk10] > > > 2. > > > > > > Exclude the bookies which are not the bookkeeperAffinityGroupPrimary > > > (share). > > > 3. > > > > > > Exclude the default group bookies [bk1, bk2, bk3, bk4, bk5]. > > > 4. > > > > > > Pick bookies from the remaining bookies [bk6, bk7, bk8, bk9, bk10, > bk11, > > > bk12, bk13, bk14, bk15] > > > > > > Therefore, some traffic may go to bk11-bk15, which is not what the > users > > > expect. The reason is that the new bookies, bk11 to bk15, did not have > rack > > > information set and were not part of any group. > > > > > > We provided a workaround for users to set the rack information for > bk11 to > > > bk15 in advance using the command pulsar-admin bookies set-bookie-rack > > > before starting them. After user adopting this workaround, the traffic > > > worked as expected. > > > > > > For user, it may be a bit inconvenient as they need to set rack > information > > > in advance before bringing new bookies online. In scenarios where > there are > > > strict limitations on traffic, if the bookie operation and maintenance > > > personnel overlook this step, it could cause problems. > > > > > > Improvement > > > > > > I would like to introduce a new configuration strict for > > > bookkeeperAffinityGroupPrimary. The default value for this > configuration is > > > false, which means that for old users upgrading to the new version, the > > > logic will remain the same and bookies without rack information will > not be > > > constrained. > > > > > > If users manually set strict to true using the command pulsar-admin > > > ns-isolation-policy set --namespaces public/default --primary "group" > > > --secondary "group" --strict true, when the broker selects a bookie, it > > > will only choose from the bookies in the primary group. If there are > not > > > enough bookies in the primary group, it will choose from the bookies > in the > > > secondary group. If there are not enough bookies in either group, an > > > exception will be thrown. Bookies without rack information set using > > > pulsar-admin > > > bookies set-bookie-rack will not be selected. > > > > > > Compatibility > > > > > > When users upgrade from the old version to the new version, the working > > > mechanism of bookkeeperAffinityGroupPrimary remains the same as before. > > > When users upgrade to the new version and set strict to true using the > > > command pulsar-admin ns-isolation-policy set --namespaces > public/default > > > --primary "group" --secondary "group" --strict true, and then roll > back to > > > the old version, the broker should be able to correctly parse the > > > ns-isolation-policy configuration. > > > >