Thanks Yang and Chesnay for your feedback. I'm gonna go ahead and start a voting thread as this discussion thread is already open for some time.
Best, Matthias On Wed, Jan 25, 2023 at 4:08 AM Yang Wang <wangyang0...@apache.org> wrote: > Having the *start()* in *LeaderContender* interface and bringing back the > *LeaderElection* with some new methods make sense to me. > > I have no more concerns now. > > > > - *LeaderContender*: The LeaderContender is integrated as usual except > > that it accesses the LeaderElection instead of the > LeaderElectionService. > > It's going to call startLeaderElection(LeaderContender) where, > previously, > > LeaderElectionService.start(LeaderContender) was called. > > > > nit: we call the *LeaderElection#startLeaderElection()*, not the > *LeaderElection#startLeaderElection(LeaderContender)*. Because we have > already set the leaderContender in the > *LeaderElection#register(LeaderContender)*. > > > Best, > Yang > > > Chesnay Schepler <ches...@apache.org> 于2023年1月23日周一 23:16写道: > > > Thanks for updating the design. From my side this looks good. > > > > On 18/01/2023 17:59, Matthias Pohl wrote: > > > After another round of discussion, I came up with a (hopefully) final > > > proposal. The previously discussed approach was still not optimal > because > > > the contender ID lived in the LeaderContender even though it is > actually > > > LeaderElectionService-internal knowledge. Fixing that helped fix the > > > overall architecture. Additionally, it brought back the LeaderElection > > > interface with slightly different methods. > > > > > > I updated the "Code Cleanup: Merge > MultipleComponentLeaderElectionService > > > into LeaderElectionService" section and moved the old proposal into the > > > section for rejected alternatives. Feel free to have another look at > the > > > updated version [1]. > > > > > > Matthias > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box > > > > > > On Wed, Jan 18, 2023 at 1:40 PM Matthias Pohl <matthias.p...@aiven.io> > > > wrote: > > > > > >> Thanks for participating in the discussion, Yang & Chesnay. > > LeaderElection > > >> interface extension gave me a headache as well. I added it initially > > >> because I thought it would be of more value. But essentially, it > doesn't > > >> help but make the code harder to understand (as your questions > > rightfully > > >> point out). I agree that the FLIP is good enough without this > > extension. I > > >> moved it into the Rejected Alternatives section of the FLIP and would > > >> propose going ahead without it. > > >> > > >> I will answer your questions about the LeaderElection extension, > anyway: > > >> > > >> BTW, if the *LeaderElectionService#register(return LeaderElection)* > and > > >>> *LeaderElectionService#onGrantLeadership* are guarded by a same lock, > > then > > >>> we could ensure that the leaderElection in *LeaderContender* is > always > > >>> non-null when it tries to confirm the leadership. And then we do not > > need > > >>> the > > >>> *LeaderContender#initializeLeaderElection*. Right? > > >> > > >> No, we still would need LeaderContender#initializeLeaderElection > because > > >> the LeaderElectionService needs to be capable of setting the > > LeaderElection > > >> within the LeaderContender before triggering the process for granting > > the > > >> leadership. This all needs to happen within the > > >> LeaderElectionService#register(LeaderContender). It's indepent of the > > lock. > > >> > > >> With the extension, how does the leader contender get access to the > > >>> LeaderElection? I would've assumed that LEService returns a > > LeaderElection > > >>> when register is called, but according to the diagram this method > > doesn't > > >>> return anything. Is that what initiateLeaderElection is doing? > > >> > > >> Correct. My initial plan was to make > > >> LeaderElectionService#register(LeaderContender) return the > > LeaderElection > > >> instance. That method could have been called within the > LeaderContender. > > >> But this approach has the flaw that LeaderContender would be in charge > > >> within this control flow where, actually, we would want > > >> LeaderElectionService to be still in charge to trigger the process for > > >> granting the leadership. This required the > > >> LeaderContender.initializeLeaderElection(LeaderElection) method to be > > added > > >> to enable the LeaderElectionService to do the initialization. I added > a > > >> comment to the corresponding class diagram to make this clearer. > > >> > > >> The DefaultLeaderElection will rely on package-private methods of the > > >>> DLEService to handle confirm/hasLeadership calls? > > >> > > >> Correct. I added the missing package-private methods to the class > > diagram > > >> in the FLIP to clear things up. > > >> > > >> On Wed, Jan 18, 2023 at 11:47 AM Chesnay Schepler <ches...@apache.org > > > > >> wrote: > > >> > > >>> There are a lot of good things in this, and until the Extension bit > I'm > > >>> fully on board. > > >>> > > >>> With the extension, how does the leader contender get access to the > > >>> LeaderElection? I would've assumed that LEService returns a > > >>> LeaderElection when register is called, but according to the diagram > > >>> this method doesn't return anything. Is that what > > initiateLeaderElection > > >>> is doing? > > >>> > > >>> The DefaultLeaderElection will rely on package-private methods of the > > >>> DLEService to handle confirm/hasLeadership calls? > > >>> > > >>> I don't fully understand why LContender#initializeLeaderElection is > > >>> required. > > >>> > > >>> On 05/01/2023 14:49, Matthias Pohl wrote: > > >>>> Hi everyone, > > >>>> I brought up FLINK-26522 [1] in the mailing list discussion about > > >>>> consolidating the HighAvailabilityServices interfaces [2], > previously. > > >>>> There, it was concluded that the community still wants the ability > to > > >>> have > > >>>> per-component leader election and, therefore, keep the > > >>>> HighAvailabilityServices interface as is. I went back to work on > > >>>> FLINK-26522 [1] to figure out how we can simplify the current > codebase > > >>>> keeping the decision in mind. > > >>>> > > >>>> I wanted to handle FLINK-26522 [1] as a follow-up cleanup task of > > >>>> FLINK-24038 [3]. But while working on it, I realized that even > > >>> FLINK-24038 > > >>>> [3] shouldn't have been handled without a FLIP. The per-process > leader > > >>>> election which was introduced in FLINK-24038 [3] changed the > ownership > > >>> of > > >>>> certain components. This is actually a change that should have been > > >>>> discussed in the mailing list and deserved a FLIP. To overcome this > > >>>> shortcoming of FLINK-24038 [3], I decided to prepare FLIP-285 [4] to > > >>>> provide proper documentation of what happened in FLINK-24038 and > what > > >>> will > > >>>> be manifested with resolving its follow-up FLINK-26522 [1]. > > >>>> > > >>>> Conceptually, this FLIP proposes moving away from Flink's support > for > > >>>> single-contender LeaderElectionServices and introducing > > multi-contender > > >>>> support by disconnecting the HA-backend leader election lifecycle > from > > >>> the > > >>>> LeaderContender's lifecycle. This allows us to provide > LeaderElection > > >>> per > > >>>> component (as it was requested in [2]) but also enables us to > utilize > > a > > >>>> single leader election for multiple components/contenders as well > > >>> without > > >>>> the complexity of the code that was introduced by FLINK-24038 [3]. > > >>>> > > >>>> I'm looking forward to your comments. > > >>>> > > >>>> Matthias > > >>>> > > >>>> [1] https://issues.apache.org/jira/browse/FLINK-26522 > > >>>> [2] > https://lists.apache.org/thread/9oy2ml9s3j1v6r77h31sndhc3gw57cfm > > >>>> [3] https://issues.apache.org/jira/browse/FLINK-24038 > > >>>> [4] > > >>>> > > >>> > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-285%3A+Refactoring+LeaderElection+to+make+Flink+support+multi-component+leader+election+out-of-the-box > > >>> > > > > >