Hi Jane, Thanks for your comments!
I guess there is no problem with the word 'on' in this scenario, since it is an event-driven-like execution model. I think the word 'hasNext' may be misleading since there is a 'hasNext' in a typical iterator which returns a boolean for the existence of a next element. I think the GPT may also be misled by this word, and misunderstood the meaning of this interface and therefore giving the wrong suggestions :) . Actually this interface is introduced to iterating elements (like next()) instead of checking the existence. I think the name `onNext()` is more suitable, WDYT? Or to be more explicit, we can add 'Compose' or 'Apply' to interfaces (onNextCompose, onNextAccept) matching the functions of corresponding APIs from 'StateFuture'. WDYT? But I'd prefer the former since it is more simple. Best, Zakelly On Tue, Mar 19, 2024 at 6:09 PM Jane Chan <qingyue....@gmail.com> wrote: > Hi Zakelly, > > Thanks for bringing this discussion. > > I'm +1 for the overall API design, except for one minor comment about the > name of StateIterator#onHasNext since I feel it is a little bit > unintuitive. Meanwhile, I asked the opinion from GPT, here's what it said > > The prefix "on" is commonly used in event-driven programming to denote an > > event handler, not to check a condition. For instance, in JavaScript, you > > might have onClick to handle click events. Therefore, using "on" may be > > misleading if the method is being used to check for the existence of a > next > > element. > > For an async iterator, you'd want a name that clearly conveys that the > > method will check for the next item asynchronously and return a promise > or > > some form of future result. In JavaScript, which supports async > iteration, > > the standard method for this is next(), which when used with async > > iterators, returns a promise that resolves to an object with properties > > value and done. > > Here are a couple of better alternatives: > > hasNextAsync: This name clearly states that the function is an asynchronous > > version of the typical hasNext method found in synchronous iterators. > > nextExists: This name suggests the method checks for the existence of a > > next item, without the potential confusion of event handler naming > > conventions. > > > > WDYT? > > Best, > Jane > > On Tue, Mar 19, 2024 at 5:47 PM Zakelly Lan <zakelly....@gmail.com> wrote: > > > Hi everyone, > > > > Thanks for your valuable feedback! > > > > The discussions were vibrant and have led to significant enhancements to > > this FLIP. With this progress, I'm looking to initiate the voting in 72 > > hours. > > > > Please let me know if you have any concerns, thanks! > > > > > > Best, > > Zakelly > > > > On Tue, Mar 19, 2024 at 5:35 PM Zakelly Lan <zakelly....@gmail.com> > wrote: > > > > > Hi Yue, > > > > > > Thanks for your comments! > > > > > > 1. Is it possible for all `FutureUtils` in Flink to reuse the same util > > >> class? > > > > > > Actually, the `FutureUtils` here is a new util class that will share > the > > > same package path with the `StateFuture`. Or I'd be fine renaming it > > > 'StateFutureUtils'. > > > > > > 2. It seems that there is no concept of retry, timeout, or delay in > your > > >> async state api design . Do we need to provide such capabilities like > > >> `orTimeout` 、`completeDelayed`? > > >> > > > For ease of use, we do not provide such APIs allowing users to > customize > > > the behavior on timeout or retry. We may introduce a retry mechanism in > > the > > > framework enabled by configuration. And we will hide the 'complete' and > > > related APIs of StateFuture from users, since the completion of these > > > futures is totally managed by the execution framework. > > > > > > > > > Best, > > > Zakelly > > > > > > > > > > > > On Tue, Mar 19, 2024 at 5:20 PM yue ma <mayuefi...@gmail.com> wrote: > > > > > >> Hi Zakelly, > > >> > > >> Thanks for your proposal. The FLIP looks good to me +1! I'd like to > ask > > >> some minor questions > > >> I found that there is also a definition of class `FutureUtils` under > > `org. > > >> apache. flink. util. concurrent` which seems to offer more interfaces. > > My > > >> question is: > > >> 1. Is it possible for all `FutureUtils` in Flink to reuse the same > util > > >> class? > > >> 2. It seems that there is no concept of retry, timeout, or delay in > your > > >> async state api design . Do we need to provide such capabilities like > > >> `orTimeout` 、`completeDelayed`? > > >> > > >> Jing Ge <j...@ververica.com.invalid> 于2024年3月13日周三 20:00写道: > > >> > > >> > indeed! I missed that part. Thanks for the hint! > > >> > > > >> > Best regards, > > >> > Jing > > >> > > > >> > On Wed, Mar 13, 2024 at 6:02 AM Zakelly Lan <zakelly....@gmail.com> > > >> wrote: > > >> > > > >> > > Hi Jing, > > >> > > > > >> > > The deprecation and removal of original APIs is beyond the scope > of > > >> > > current FLIP, but I do add/highlight such information under > > >> > "Compatibility, > > >> > > Deprecation, and Migration Plan" section. > > >> > > > > >> > > > > >> > > Best, > > >> > > Zakelly > > >> > > > > >> > > On Wed, Mar 13, 2024 at 9:18 AM Yunfeng Zhou < > > >> > flink.zhouyunf...@gmail.com> > > >> > > wrote: > > >> > > > > >> > >> Hi Zakelly, > > >> > >> > > >> > >> Thanks for your responses. I agree with it that we can keep the > > >> design > > >> > >> as it is for now and see if others have any better ideas for > these > > >> > >> questions. > > >> > >> > > >> > >> Best, > > >> > >> Yunfeng > > >> > >> > > >> > >> On Tue, Mar 12, 2024 at 5:23 PM Zakelly Lan < > zakelly....@gmail.com > > > > > >> > >> wrote: > > >> > >> > > > >> > >> > Hi Xuannan, > > >> > >> > > > >> > >> > Thanks for your comments, I modified the FLIP accordingly. > > >> > >> > > > >> > >> > Hi Yunfeng, > > >> > >> > > > >> > >> > Thanks for sharing your opinions! > > >> > >> > > > >> > >> >> Could you provide some hint on use cases where users need to > mix > > >> sync > > >> > >> >> and async state operations in spite of the performance > > regression? > > >> > >> >> This information might help address our concerns on design. If > > the > > >> > >> >> mixed usage is simply something not recommended, I would > prefer > > to > > >> > >> >> prohibit such usage from API. > > >> > >> > > > >> > >> > In fact, there is no scenario where users MUST use the sync > APIs, > > >> but > > >> > >> it is much easier to use for those who are not familiar with > > >> > asynchronous > > >> > >> programming. If they want to migrate their job from Flink 1.x to > > 2.0 > > >> > >> leveraging some benefits from asynchronous APIs, they may try the > > >> mixed > > >> > >> usage. It is not user-friendly to directly throw exceptions at > > >> runtime, > > >> > I > > >> > >> think our better approach is to warn users and recommend avoiding > > >> this. > > >> > I > > >> > >> added an example in this FLIP. > > >> > >> > > > >> > >> > Well, I do not insist on allowing mixed usage of APIs if others > > >> reach > > >> > >> an agreement that we won't support that . I think the most > > important > > >> is > > >> > to > > >> > >> keep the API easy to use and understand, thus I propose a unified > > >> state > > >> > >> declaration and explicit meaning in method name. WDYT? > > >> > >> > > > >> > >> >> Sorry I missed the new sink API. I do still think that it > would > > be > > >> > >> >> better to make the package name more informative, and ".v2." > > does > > >> not > > >> > >> >> contain information for new Flink users who did not know the > v1 > > of > > >> > >> >> state API. Unlike internal implementation and performance > > >> > >> >> optimization, API will hardly be compromised for now and > updated > > >> in > > >> > >> >> future, so I still suggest we improve the package name now if > > >> > >> >> possible. But given the existing practice of sink v2 and > > >> > >> >> AbstractStreamOperatorV2, the current package name would be > > >> > acceptable > > >> > >> >> to me if other reviewers of this FLIP agrees on it. > > >> > >> > > > >> > >> > Actually, I don't like 'v2' either. So if there is another good > > >> name, > > >> > >> I'd be happy to apply. This is a compromise to the current > > situation. > > >> > Maybe > > >> > >> we could refine this after the retirement of original state APIs. > > >> > >> > > > >> > >> > > > >> > >> > Thanks & Best, > > >> > >> > Zakelly > > >> > >> > > > >> > >> > > > >> > >> > On Tue, Mar 12, 2024 at 1:42 PM Yunfeng Zhou < > > >> > >> flink.zhouyunf...@gmail.com> wrote: > > >> > >> >> > > >> > >> >> Hi Zakelly, > > >> > >> >> > > >> > >> >> Thanks for the quick response! > > >> > >> >> > > >> > >> >> > Actually splitting APIs into two sets ... warn them in > > runtime. > > >> > >> >> > > >> > >> >> Could you provide some hint on use cases where users need to > mix > > >> sync > > >> > >> >> and async state operations in spite of the performance > > regression? > > >> > >> >> This information might help address our concerns on design. If > > the > > >> > >> >> mixed usage is simply something not recommended, I would > prefer > > to > > >> > >> >> prohibit such usage from API. > > >> > >> >> > > >> > >> >> > In fact ... .sink2`. > > >> > >> >> > > >> > >> >> Sorry I missed the new sink API. I do still think that it > would > > be > > >> > >> >> better to make the package name more informative, and ".v2." > > does > > >> not > > >> > >> >> contain information for new Flink users who did not know the > v1 > > of > > >> > >> >> state API. Unlike internal implementation and performance > > >> > >> >> optimization, API will hardly be compromised for now and > updated > > >> in > > >> > >> >> future, so I still suggest we improve the package name now if > > >> > >> >> possible. But given the existing practice of sink v2 and > > >> > >> >> AbstractStreamOperatorV2, the current package name would be > > >> > acceptable > > >> > >> >> to me if other reviewers of this FLIP agrees on it. > > >> > >> >> > > >> > >> >> Best, > > >> > >> >> Yunfeng > > >> > >> >> > > >> > >> >> On Mon, Mar 11, 2024 at 5:27 PM Zakelly Lan < > > >> zakelly....@gmail.com> > > >> > >> wrote: > > >> > >> >> > > > >> > >> >> > Hi Yunfeng, > > >> > >> >> > > > >> > >> >> > Thanks for your comments! > > >> > >> >> > > > >> > >> >> > +1 for JingGe's suggestion to introduce an AsyncState API, > > >> instead > > >> > of > > >> > >> >> > > having both get() and asyncGet() in the same State class. > > As a > > >> > >> >> > > supplement to its benefits, this design could help avoid > > >> having > > >> > >> users > > >> > >> >> > > to use sync and async API in a mixed way (unless they > create > > >> > both a > > >> > >> >> > > State and an AsyncState from the same state descriptor), > > >> which is > > >> > >> >> > > supposed to bring suboptimal performance according to the > > >> FLIP's > > >> > >> >> > > description. > > >> > >> >> > > > >> > >> >> > > > >> > >> >> > Actually splitting APIs into two sets of classes also brings > > >> some > > >> > >> >> > difficulties. In this case, users must explicitly define > their > > >> > usage > > >> > >> before > > >> > >> >> > actually doing state access. It is a little strange that the > > >> user > > >> > can > > >> > >> >> > define a sync and an async version of State with the same > > name, > > >> > >> while they > > >> > >> >> > cannot allocate two async States with the same name. > > >> > >> >> > Another reason for distinguishing API by their method name > > >> instead > > >> > >> of class > > >> > >> >> > name is that users typically use the State instances to > access > > >> > state > > >> > >> but > > >> > >> >> > forget their type/class. For example: > > >> > >> >> > ``` > > >> > >> >> > SyncState a = getState(xxx); > > >> > >> >> > AsyncState b = getAsyncState(xxx); > > >> > >> >> > //... > > >> > >> >> > a.update(1); > > >> > >> >> > b.update(1); > > >> > >> >> > ``` > > >> > >> >> > Users are likely to think there is no difference between the > > >> > >> `a.update(1)` > > >> > >> >> > and `b.update(1)`, since they may forget the type for `a` > and > > >> `b`. > > >> > >> Thus I > > >> > >> >> > proposed to distinguish the behavior in method names. > > >> > >> >> > As for the suboptimal performance with mixed usage of sync > and > > >> > >> async, my > > >> > >> >> > proposal is to warn them in runtime. > > >> > >> >> > > > >> > >> >> > I noticed that the FLIP proposes to place the newly > introduced > > >> API > > >> > in > > >> > >> >> > > the package "org.apache.flink.api.common.state.v2", which > > >> seems a > > >> > >> >> > > little strange to me as there has not been such a naming > > >> pattern > > >> > >> >> > > ".v2." for packages in Flink. > > >> > >> >> > > > >> > >> >> > > > >> > >> >> > In fact, there are some similar existing patterns, like > > >> > >> >> > `org.apache.flink.streaming.api.functions.sink.v2` and > > >> > >> >> > `org.apache.flink.streaming.api.connector.sink2`. > > >> > >> >> > > > >> > >> >> > I would suggest discussing this topic > > >> > >> >> > > with the main authors of Datastream V2, like Weijie Guo, > so > > >> that > > >> > >> the > > >> > >> >> > > newly introduced APIs from both sides comply with a > unified > > >> > naming > > >> > >> >> > > style. > > >> > >> >> > > > >> > >> >> > I'm afraid we are facing a different situation with the > > >> Datastream > > >> > >> V2. For > > >> > >> >> > total reconstruction of Datastream API, it is big enough to > > >> build a > > >> > >> >> > seperate module and keep good package names. While for state > > >> APIs, > > >> > we > > >> > >> >> > should stay in the flink-core(-api) module alongside with > > other > > >> > >> >> > apis, currently I tend to compromise at the expense of > naming > > >> > style. > > >> > >> >> > > > >> > >> >> > > > >> > >> >> > Looking forward to hearing from you again! > > >> > >> >> > > > >> > >> >> > Thanks & Best, > > >> > >> >> > Zakelly > > >> > >> >> > > > >> > >> >> > On Mon, Mar 11, 2024 at 4:20 PM Yunfeng Zhou < > > >> > >> flink.zhouyunf...@gmail.com> > > >> > >> >> > wrote: > > >> > >> >> > > > >> > >> >> > > Hi Zakelly, > > >> > >> >> > > > > >> > >> >> > > Thanks for the proposal! The structure of the Async API > > >> generally > > >> > >> >> > > looks good to me. Some comments on the details of the > design > > >> are > > >> > as > > >> > >> >> > > follows. > > >> > >> >> > > > > >> > >> >> > > +1 for JingGe's suggestion to introduce an AsyncState API, > > >> > instead > > >> > >> of > > >> > >> >> > > having both get() and asyncGet() in the same State class. > > As a > > >> > >> >> > > supplement to its benefits, this design could help avoid > > >> having > > >> > >> users > > >> > >> >> > > to use sync and async API in a mixed way (unless they > create > > >> > both a > > >> > >> >> > > State and an AsyncState from the same state descriptor), > > >> which is > > >> > >> >> > > supposed to bring suboptimal performance according to the > > >> FLIP's > > >> > >> >> > > description. > > >> > >> >> > > > > >> > >> >> > > I noticed that the FLIP proposes to place the newly > > introduced > > >> > API > > >> > >> in > > >> > >> >> > > the package "org.apache.flink.api.common.state.v2", which > > >> seems a > > >> > >> >> > > little strange to me as there has not been such a naming > > >> pattern > > >> > >> >> > > ".v2." for packages in Flink. I would suggest discussing > > this > > >> > topic > > >> > >> >> > > with the main authors of Datastream V2, like Weijie Guo, > so > > >> that > > >> > >> the > > >> > >> >> > > newly introduced APIs from both sides comply with a > unified > > >> > naming > > >> > >> >> > > style. If we reach an agreement on the first comment, my > > >> personal > > >> > >> idea > > >> > >> >> > > is that we can place the AsyncState interfaces to > > >> > >> >> > > "org.apache.flink.api.common.state.async", and the > existing > > >> state > > >> > >> APIs > > >> > >> >> > > to "org.apache.flink.api.common.state" or > > >> > >> >> > > "org.apache.flink.api.common.state.sync". > > >> > >> >> > > > > >> > >> >> > > Best regards, > > >> > >> >> > > Yunfeng Zhou > > >> > >> >> > > > > >> > >> >> > > On Thu, Mar 7, 2024 at 4:48 PM Zakelly Lan < > > >> > zakelly....@gmail.com> > > >> > >> wrote: > > >> > >> >> > > > > > >> > >> >> > > > Hi devs, > > >> > >> >> > > > > > >> > >> >> > > > I'd like to start a discussion on a sub-FLIP of > FLIP-423: > > >> > >> Disaggregated > > >> > >> >> > > > State Storage and Management[1], which is a joint work > of > > >> Yuan > > >> > >> Mei, > > >> > >> >> > > Zakelly > > >> > >> >> > > > Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng > Wang: > > >> > >> >> > > > > > >> > >> >> > > > - FLIP-424: Asynchronous State APIs [2] > > >> > >> >> > > > > > >> > >> >> > > > This FLIP introduces new APIs for asynchronous state > > access. > > >> > >> >> > > > > > >> > >> >> > > > Please make sure you have read the FLIP-423[1] to know > the > > >> > whole > > >> > >> story, > > >> > >> >> > > and > > >> > >> >> > > > we'll discuss the details of FLIP-424[2] under this > mail. > > >> For > > >> > the > > >> > >> >> > > > discussion of overall architecture or topics related > with > > >> > >> multiple > > >> > >> >> > > > sub-FLIPs, please post in the previous mail[3]. > > >> > >> >> > > > > > >> > >> >> > > > Looking forward to hearing from you! > > >> > >> >> > > > > > >> > >> >> > > > [1] https://cwiki.apache.org/confluence/x/R4p3EQ > > >> > >> >> > > > [2] https://cwiki.apache.org/confluence/x/SYp3EQ > > >> > >> >> > > > [3] > > >> > >> https://lists.apache.org/thread/ct8smn6g9y0b8730z7rp9zfpnwmj8vf0 > > >> > >> >> > > > > > >> > >> >> > > > > > >> > >> >> > > > Best, > > >> > >> >> > > > Zakelly > > >> > >> >> > > > > >> > >> > > >> > > > > >> > > > >> > > >> > > >> -- > > >> Best, > > >> Yue > > >> > > > > > >