Hi, I am very happy to see opinions from different perspectives. That will help us understand the problem better. Thanks all for the informative discussion.
Let's see the big picture and check following facts together: 1. FLIP-27 was intended to solve some technical issues that are very difficult to solve with SourceFunction[1]. When we say "SourceFunction is easy", well, it depends. If we take a look at the implementation of the Kafka connector, we will know how complicated it is to build a serious connector for production with the old SourceFunction. To every problem there is a solution and to every solution there is a problem. The fact is that there is no perfect but a feasible solution. If we try to solve complicated problems, we have to expose some complexity. Comparing to connectors for POC, demo, training(no offense), I would also solve issues for connectors like Kafka connector that are widely used in production with higher priority. I think that should be one reason why FLIP-27 has been designed and why the new source API went public. 2. FLIP-27 and the implementation was introduced roughly at the end of 2019 and went public on 19.04.2021, which means Flink has provided two different public/graduated source solutions for more than one year. On the day that the new source API went public, there should be a consensus in the community that we should start the migration. Old SourceFunction interface, in the ideal case, should have been deprecated on that day, otherwise we should not graduate the new source API to avoid confusing (connector) developers[2]. 3. It is true that the new source API is hard to understand and even hard to implement for simple cases. Thanks for the feedback. That is something we need to improve. The current design&implementation could be considered as the low level API. The next step is to create the high level API to reduce some unnecessary complexity for those simple cases. But, IMHO, this should not be the prerequisite to postpone the deprecation of the old SourceFunction APIs. 4. As long as the old SourceFunction is not marked as deprecated, developers will continue asking which one should be used. Let's make a concrete example. If a new connector is developed now and the developer asks for a suggestion of the choice between the old and new source API on the ML, which one should we suggest? I think it should be the new Source API. If a fresh new connector has been developed with the old SourceFunction API before asking for the consensus in the community and the developer wants to merge it to the master. Should we allow it? If the answer of all these questions is pointing to the new Source API, the old SourceFunction is de facto already deprecated, just has not been marked as @deprecated, which confuses developers even more. Best regards, Jing [1] https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface [2] https://lists.apache.org/thread/7okp4y46n3o3rx5mn0t3qobrof8zxwqs On Wed, Jun 8, 2022 at 2:21 AM Alexander Fedulov <alexan...@ververica.com> wrote: > Hey Austin, > > Since we are getting deeper into the implementation details of the > DataGeneratorSource > and it is not the main topic of this thread, I propose to move our > discussion to where it belongs: [DISCUSS] FLIP-238 [1]. Could you please > briefly formulate your requirements to make it easier for the others to > follow? I am happy to continue this conversation there. > > [1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt > > Best, > Alexander Fedulov > > On Tue, Jun 7, 2022 at 6:14 PM Austin Cawley-Edwards < > austin.caw...@gmail.com> wrote: > > > > @Austin, in the FLIP I mentioned above [1], the user is expected to > > pass a MapFunction<Long, > > OUT> > > to the generator. I wonder if you could have your external client and > > polling logic wrapped in a custom > > MapFunction implementation class? Would that answer your needs or do you > > have some > > more sophisticated scenario in mind? > > > > At first glance, the FLIP looks good but for this case in regards to the > > map function, but leaves out 1) ability to control polling intervals and > 2) > > ability to produce an unknown number of records, both per-poll and > overall > > boundedness. Do you think something like this could be built from the > same > > pieces? > > I'm also wondering what handles threading, is that on the user or is that > > part of the DataGeneratorSource? > > > > Best, > > Austin > > > > On Tue, Jun 7, 2022 at 9:34 AM Alexander Fedulov < > alexan...@ververica.com> > > wrote: > > > > > Hi everyone, > > > > > > Thanks for all the input and a lively discussion. It seems that there > is > > a > > > consensus that due to > > > the inherent complexity of FLIP-27 sources we should provide more > > > user-facing utilities to bridge > > > the gap between the existing SourceFunction-based functionality and the > > new > > > APIs. > > > > > > To start addressing this I picked the issue that David raised and many > > > upvoted. Here is a proposal > > > for the new DataGeneratorSource: FLIP-238 [1]. Please take a look, I > am > > > going to open a separate > > > discussion thread on it shortly. > > > > > > Jing also raised some great points regarding the interfaces and > > subclasses. > > > It seems to me that > > > what might actually help is some sort of a "soft deprecation" concept > and > > > annotation. It could be > > > used in places where we do not have an alternative implementation yet, > > but > > > we clearly want > > > to indicate that continuing to build on top of these interfaces is > > > discouraged. The area of > > > impact of deprecating all SourceFunction subclasses is rather big, and > we > > > can expect it to > > > take a while. The hope would be that if in the meantime someone finds > > > themselves using one of > > > such old APIs, the "soft deprecation" annotation will be a clear > > indication > > > and encouragement to > > > work on introducing an alternative FLIP-27-based implementation > instead. > > > > > > @Austin, in the FLIP I mentioned above [1], the user is expected to > > > pass a MapFunction<Long, > > > OUT> > > > to the generator. I wonder if you could have your external client and > > > polling logic wrapped in a custom > > > MapFunction implementation class? Would that answer your needs or do > you > > > have some > > > more sophisticated scenario in mind? > > > > > > [1] https://cwiki.apache.org/confluence/x/9Av1D > > > Best, > > > Alexander Fedulov > > > > > > On Mon, Jun 6, 2022 at 7:08 PM Austin Cawley-Edwards < > > > austin.caw...@gmail.com> wrote: > > > > > > > Thanks for the nice discussion all. > > > > > > > > I was recently trying to implement a very simple polling source and > > > > would've loved a higher-level base to work from. I'm wondering if in > > > > addition to the data generator use cases, it would be good to > support a > > > > simple non-parallel polling abstraction to make it easier to, for > > > instance, > > > > start prototyping with data in existing APIs without adding a Kafka > or > > > such > > > > in the middle. > > > > > > > > Best, > > > > Austin > > > > > > > > On Mon, Jun 6, 2022 at 10:02 AM tison <wander4...@gmail.com> wrote: > > > > > > > > > Well. It's a bit off-topic. For deprecating SourceFunction as > FLIP-27 > > > > > series works go ahead, +1 from my side. It's a significant work > > towards > > > > the > > > > > unification of batch and streaming effort :) > > > > > > > > > > Best, > > > > > tison. > > > > > > > > > > > > > > > tison <wander4...@gmail.com> 于2022年6月6日周一 21:54写道: > > > > > > > > > > > The starting point of the version bump and removal question is > that > > > > > > downstream projects may experience a tough time to adapt new > > > interfaces > > > > > > while Flink keeps in 1.x versions so that users may expect it as > an > > > > easy > > > > > > task. From my experience, it's really challenge to maintain > > > > > > compatibility between multiple versions of Flink while > significant > > > > > changes > > > > > > made but sharing 1.x version series - users may not be aware that > > > it's > > > > > > almost a major version bump. > > > > > > > > > > > > Best, > > > > > > tison. > > > > > > > > > > > > > > > > > > tison <wander4...@gmail.com> 于2022年6月6日周一 21:51写道: > > > > > > > > > > > >> One question from my side: > > > > > >> > > > > > >> As SourceFunction a @Public interface, we cannot remove it > before > > > > doing > > > > > >> a major version bump (Flink 2.0). > > > > > >> > > > > > >> Of course it's not a blocker to make such deprecation and let > the > > > new > > > > > >> interface step in. My question is whether we have a plan to > > finally > > > > > remove > > > > > >> the deprecated interfaces, or postpone it until a clear plan of > > > Flink > > > > > 2.0? > > > > > >> > > > > > >> Best, > > > > > >> tison. > > > > > >> > > > > > >> > > > > > >> David Anderson <dander...@apache.org> 于2022年6月6日周一 21:35写道: > > > > > >> > > > > > >>> > > > > > > >>> > David, can you elaborate why you need watermark generation in > > the > > > > > >>> source > > > > > >>> > for your data generators? > > > > > >>> > > > > > >>> > > > > > >>> The training exercises should strive to provide examples of > best > > > > > >>> practices. > > > > > >>> If the exercises and their solutions use > > > > > >>> > > > > > >>> env.fromSource(source, WatermarkStrategy.noWatermarks(), > > > > > >>> "name-of-source") > > > > > >>> .map(...) > > > > > >>> .assignTimestampsAndWatermarks(...) > > > > > >>> > > > > > >>> this will help establish this anti-pattern as the normal way of > > > doing > > > > > >>> things. > > > > > >>> > > > > > >>> Most new Flink users are using a KafkaSource with a > noWatermarks > > > > > strategy > > > > > >>> and a SimpleStringSchema, followed by a map that does the real > > > > > >>> deserialization, followed by the real watermarking -- because > > they > > > > > aren't > > > > > >>> seeing examples that teach how these interfaces are meant to be > > > used. > > > > > >>> > > > > > >>> When we redo the sources used in training exercises, I want to > > > avoid > > > > > >>> these > > > > > >>> pitfalls. > > > > > >>> > > > > > >>> David > > > > > >>> > > > > > >>> On Mon, Jun 6, 2022 at 9:12 AM Konstantin Knauf < > > kna...@apache.org > > > > > > > > > >>> wrote: > > > > > >>> > > > > > >>> > Hi everyone, > > > > > >>> > > > > > > >>> > very interesting thread. The proposal for deprecation seems > to > > > have > > > > > >>> sparked > > > > > >>> > a very important discussion. Do we what users struggle with > > > > > >>> specifically? > > > > > >>> > > > > > > >>> > Speaking for myself, when I upgrade flink-faker to the new > > Source > > > > API > > > > > >>> an > > > > > >>> > unbounded version of the NumberSequenceSource would have been > > > all I > > > > > >>> needed, > > > > > >>> > but that's just the data generator use case. I think, that > one > > > > could > > > > > be > > > > > >>> > solved quite easily. David, can you elaborate why you need > > > > watermark > > > > > >>> > generation in the source for your data generators? > > > > > >>> > > > > > > >>> > Cheers, > > > > > >>> > > > > > > >>> > Konstantin > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > Am So., 5. Juni 2022 um 17:48 Uhr schrieb Piotr Nowojski < > > > > > >>> > pnowoj...@apache.org>: > > > > > >>> > > > > > > >>> > > Also +1 to what David has written. But it doesn't mean we > > > should > > > > be > > > > > >>> > waiting > > > > > >>> > > indefinitely to deprecate SourceFunction. > > > > > >>> > > > > > > > >>> > > Best, > > > > > >>> > > Piotrek > > > > > >>> > > > > > > > >>> > > niedz., 5 cze 2022 o 16:46 Jark Wu <imj...@gmail.com> > > > > napisał(a): > > > > > >>> > > > > > > > >>> > > > +1 to David's point. > > > > > >>> > > > > > > > > >>> > > > Usually, when we deprecate some interfaces, we should > point > > > > users > > > > > >>> to > > > > > >>> > use > > > > > >>> > > > the recommended alternatives. > > > > > >>> > > > However, implementing the new Source interface for some > > > simple > > > > > >>> > scenarios > > > > > >>> > > is > > > > > >>> > > > too challenging and complex. > > > > > >>> > > > We also found it isn't easy to push the internal > connector > > to > > > > > >>> upgrade > > > > > >>> > to > > > > > >>> > > > the new Source because > > > > > >>> > > > "FLIP-27 are hard to understand, while SourceFunction is > > > easy". > > > > > >>> > > > > > > > > >>> > > > +1 to make implementing a simple Source easier before > > > > deprecating > > > > > >>> > > > SourceFunction. > > > > > >>> > > > > > > > > >>> > > > Best, > > > > > >>> > > > Jark > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > On Sun, 5 Jun 2022 at 07:29, Jingsong Lee < > > > > > lzljs3620...@apache.org > > > > > >>> > > > > > > >>> > > wrote: > > > > > >>> > > > > > > > > >>> > > > > +1 to David and Ingo. > > > > > >>> > > > > > > > > > >>> > > > > Before deprecate and remove SourceFunction, we should > > have > > > > some > > > > > >>> > easier > > > > > >>> > > > APIs > > > > > >>> > > > > to wrap new Source, the cost to write a new Source is > too > > > > high > > > > > >>> now. > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > Ingo Bürk <airbla...@apache.org>于2022年6月5日 周日05:32写道: > > > > > >>> > > > > > > > > > >>> > > > > > I +1 everything David said. The new Source API raised > > the > > > > > >>> > complexity > > > > > >>> > > > > > significantly. It's great to have such a rich, > powerful > > > API > > > > > >>> that > > > > > >>> > can > > > > > >>> > > do > > > > > >>> > > > > > everything, but in the process we lost the ability to > > > > onboard > > > > > >>> > people > > > > > >>> > > to > > > > > >>> > > > > > the APIs. > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > Best > > > > > >>> > > > > > Ingo > > > > > >>> > > > > > > > > > > >>> > > > > > On 04.06.22 21:21, David Anderson wrote: > > > > > >>> > > > > > > I'm in favor of this, but I think we need to make > it > > > > easier > > > > > >>> to > > > > > >>> > > > > implement > > > > > >>> > > > > > > data generators and test sources. As things stand > in > > > > 1.15, > > > > > >>> unless > > > > > >>> > > you > > > > > >>> > > > > can > > > > > >>> > > > > > > be satisfied with using a NumberSequenceSource > > followed > > > > by > > > > > a > > > > > >>> map, > > > > > >>> > > > > things > > > > > >>> > > > > > > get quite complicated. I looked into reworking the > > data > > > > > >>> > generators > > > > > >>> > > > used > > > > > >>> > > > > > in > > > > > >>> > > > > > > the training exercises, and got discouraged by the > > > amount > > > > > of > > > > > >>> work > > > > > >>> > > > > > involved. > > > > > >>> > > > > > > (The sources used in the training want to be > > unbounded, > > > > and > > > > > >>> need > > > > > >>> > > > > > > watermarking in the sources, which means that using > > > > > >>> > > > > NumberSequenceSource > > > > > >>> > > > > > > isn't an option.) > > > > > >>> > > > > > > > > > > > >>> > > > > > > I think the proposed deprecation will be better > > > received > > > > if > > > > > >>> it > > > > > >>> > can > > > > > >>> > > be > > > > > >>> > > > > > > accompanied by something that makes implementing a > > > simple > > > > > >>> Source > > > > > >>> > > > easier > > > > > >>> > > > > > > than it is now. People are continuing to implement > > new > > > > > >>> > > > SourceFunctions > > > > > >>> > > > > > > because the interfaces defined by FLIP-27 are hard > to > > > > > >>> understand, > > > > > >>> > > > while > > > > > >>> > > > > > > SourceFunction is easy. Alex, I believe you were > > > looking > > > > > into > > > > > >>> > > > > > implementing > > > > > >>> > > > > > > an easier-to-use building block that could be used > in > > > > > >>> situations > > > > > >>> > > like > > > > > >>> > > > > > this. > > > > > >>> > > > > > > Can we get something like that in place first? > > > > > >>> > > > > > > > > > > > >>> > > > > > > David > > > > > >>> > > > > > > > > > > > >>> > > > > > > On Fri, Jun 3, 2022 at 4:52 PM Jing Ge < > > > > j...@ververica.com > > > > > > > > > > > >>> > wrote: > > > > > >>> > > > > > > > > > > > >>> > > > > > >> Hi, > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> Thanks Alex for driving this! > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> +1 To give the Flink developers, especially > > Connector > > > > > >>> developers > > > > > >>> > > the > > > > > >>> > > > > > clear > > > > > >>> > > > > > >> signal that the new Source API is recommended > > > according > > > > to > > > > > >>> > > FLIP-27, > > > > > >>> > > > we > > > > > >>> > > > > > >> should mark them as deprecated. > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> There are some open questions to discuss: > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> 1. Do we need to mark all subinterfaces/subclasses > > as > > > > > >>> > deprecated? > > > > > >>> > > > e.g. > > > > > >>> > > > > > >> FromElementsFunction, etc. there are many. What > are > > > the > > > > > >>> > > > replacements? > > > > > >>> > > > > > >> 2. Do we need to mark all subclasses that have > > > > replacement > > > > > >>> as > > > > > >>> > > > > > deprecated? > > > > > >>> > > > > > >> e.g. ExternallyInducedSource whose replacement > > class, > > > > if I > > > > > >>> am > > > > > >>> > not > > > > > >>> > > > > > mistaken, > > > > > >>> > > > > > >> ExternallyInducedSourceReader is @Experimental > > > > > >>> > > > > > >> 3. Do we need to mark all related test utility > > classes > > > > as > > > > > >>> > > > deprecated? > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> I think it might make sense to create an umbrella > > > ticket > > > > > to > > > > > >>> > cover > > > > > >>> > > > all > > > > > >>> > > > > of > > > > > >>> > > > > > >> these with the following process: > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> 1. Mark SourceFunction as deprecated asap. > > > > > >>> > > > > > >> 2. Mark subinterfaces and subclasses as > deprecated, > > if > > > > > >>> there are > > > > > >>> > > > > > graduated > > > > > >>> > > > > > >> replacements. Good example is that KafkaSource > > > replaced > > > > > >>> > > > KafkaConsumer > > > > > >>> > > > > > which > > > > > >>> > > > > > >> has been marked as deprecated. > > > > > >>> > > > > > >> 3. Do not mark subinterfaces and subclasses as > > > > deprecated, > > > > > >>> if > > > > > >>> > > > > > replacement > > > > > >>> > > > > > >> classes are still experimental, check if it is > time > > to > > > > > >>> graduate > > > > > >>> > > > them. > > > > > >>> > > > > > After > > > > > >>> > > > > > >> graduation, go to step 2. It might take a while > for > > > > > >>> graduation. > > > > > >>> > > > > > >> 4. Do not mark subinterfaces and subclasses as > > > > deprecated, > > > > > >>> if > > > > > >>> > the > > > > > >>> > > > > > >> replacement classes are experimental and are too > > young > > > > to > > > > > >>> > > graduate. > > > > > >>> > > > We > > > > > >>> > > > > > have > > > > > >>> > > > > > >> to wait. But in this case we could create new > > tickets > > > > > under > > > > > >>> the > > > > > >>> > > > > umbrella > > > > > >>> > > > > > >> ticket. > > > > > >>> > > > > > >> 5. Do not mark subinterfaces and subclasses as > > > > deprecated, > > > > > >>> if > > > > > >>> > > there > > > > > >>> > > > is > > > > > >>> > > > > > no > > > > > >>> > > > > > >> replacement at all. We have to create new tickets > > and > > > > wait > > > > > >>> until > > > > > >>> > > the > > > > > >>> > > > > new > > > > > >>> > > > > > >> implementation has been done and graduated. It > will > > > > take a > > > > > >>> > longer > > > > > >>> > > > > time, > > > > > >>> > > > > > >> roughly 1,5 years. > > > > > >>> > > > > > >> 6. For test classes, we could follow the same > rule. > > > But > > > > I > > > > > >>> think > > > > > >>> > > for > > > > > >>> > > > > some > > > > > >>> > > > > > >> cases, we could consider doing the replacement > > > directly > > > > > >>> without > > > > > >>> > > > going > > > > > >>> > > > > > >> through the deprecation phase. > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> When we look back on all of these, we can realize > it > > > is > > > > a > > > > > >>> big > > > > > >>> > epic > > > > > >>> > > > > (even > > > > > >>> > > > > > >> bigger than an epic). It needs someone to drive it > > and > > > > > keep > > > > > >>> > focus > > > > > >>> > > on > > > > > >>> > > > > it > > > > > >>> > > > > > >> continuously with support from the community and > > push > > > > the > > > > > >>> > > > development > > > > > >>> > > > > > >> towards the new Source API of FLIP-27. > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> If we could have consensus for this, Alex and I > > could > > > > > >>> create > > > > > >>> > the > > > > > >>> > > > > > umbrella > > > > > >>> > > > > > >> ticket to kick it off. > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> Best regards, > > > > > >>> > > > > > >> Jing > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> > > > > > >>> > > > > > >> On Fri, Jun 3, 2022 at 3:54 PM Alexander Fedulov < > > > > > >>> > > > > > alexan...@ververica.com> > > > > > >>> > > > > > >> wrote: > > > > > >>> > > > > > >> > > > > > >>> > > > > > >>> Hi everyone, > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> I would like to start the discussion about > marking > > > > > >>> > > > > SourceFunction-based > > > > > >>> > > > > > >>> interfaces as deprecated. With the FLIP-27 APIs > > > > becoming > > > > > >>> the > > > > > >>> > new > > > > > >>> > > > > > >> standard, > > > > > >>> > > > > > >>> the old ones have to be eventually phased out. > > > Although > > > > > >>> this > > > > > >>> > > state > > > > > >>> > > > is > > > > > >>> > > > > > >> well > > > > > >>> > > > > > >>> known within the community and no new connectors > > > based > > > > on > > > > > >>> the > > > > > >>> > old > > > > > >>> > > > > > >>> interfaces can be accepted into the project, the > > > > > footprint > > > > > >>> of > > > > > >>> > > > > > >>> SourceFunction in the user code still keeps > growing > > > > > >>> (primarily > > > > > >>> > > for > > > > > >>> > > > > data > > > > > >>> > > > > > >>> generators and test utilities). I believe it is > > best > > > to > > > > > >>> mark > > > > > >>> > > > > > >> SourceFunction > > > > > >>> > > > > > >>> as deprecated as soon as possible. What do you > > think? > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> Best, > > > > > >>> > > > > > >>> Alexander Fedulov > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >> > > > > > >>> > > > > > > > > > > > >>> > > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > > >>> > -- > > > > > >>> > https://twitter.com/snntrable > > > > > >>> > https://github.com/knaufk > > > > > >>> > > > > > > >>> > > > > > >> > > > > > > > > > > > > > > >