Hey Austin, Since we are getting deeper into the implementation details of the DataGeneratorSource and it is not the main topic of this thread, I propose to move our discussion to where it belongs: [DISCUSS] FLIP-238 [1]. Could you please briefly formulate your requirements to make it easier for the others to follow? I am happy to continue this conversation there.
[1] https://lists.apache.org/thread/7gjxto1rmkpff4kl54j8nlg5db2rqhkt Best, Alexander Fedulov On Tue, Jun 7, 2022 at 6:14 PM Austin Cawley-Edwards < [email protected]> wrote: > > @Austin, in the FLIP I mentioned above [1], the user is expected to > pass a MapFunction<Long, > OUT> > to the generator. I wonder if you could have your external client and > polling logic wrapped in a custom > MapFunction implementation class? Would that answer your needs or do you > have some > more sophisticated scenario in mind? > > At first glance, the FLIP looks good but for this case in regards to the > map function, but leaves out 1) ability to control polling intervals and 2) > ability to produce an unknown number of records, both per-poll and overall > boundedness. Do you think something like this could be built from the same > pieces? > I'm also wondering what handles threading, is that on the user or is that > part of the DataGeneratorSource? > > Best, > Austin > > On Tue, Jun 7, 2022 at 9:34 AM Alexander Fedulov <[email protected]> > wrote: > > > Hi everyone, > > > > Thanks for all the input and a lively discussion. It seems that there is > a > > consensus that due to > > the inherent complexity of FLIP-27 sources we should provide more > > user-facing utilities to bridge > > the gap between the existing SourceFunction-based functionality and the > new > > APIs. > > > > To start addressing this I picked the issue that David raised and many > > upvoted. Here is a proposal > > for the new DataGeneratorSource: FLIP-238 [1]. Please take a look, I am > > going to open a separate > > discussion thread on it shortly. > > > > Jing also raised some great points regarding the interfaces and > subclasses. > > It seems to me that > > what might actually help is some sort of a "soft deprecation" concept and > > annotation. It could be > > used in places where we do not have an alternative implementation yet, > but > > we clearly want > > to indicate that continuing to build on top of these interfaces is > > discouraged. The area of > > impact of deprecating all SourceFunction subclasses is rather big, and we > > can expect it to > > take a while. The hope would be that if in the meantime someone finds > > themselves using one of > > such old APIs, the "soft deprecation" annotation will be a clear > indication > > and encouragement to > > work on introducing an alternative FLIP-27-based implementation instead. > > > > @Austin, in the FLIP I mentioned above [1], the user is expected to > > pass a MapFunction<Long, > > OUT> > > to the generator. I wonder if you could have your external client and > > polling logic wrapped in a custom > > MapFunction implementation class? Would that answer your needs or do you > > have some > > more sophisticated scenario in mind? > > > > [1] https://cwiki.apache.org/confluence/x/9Av1D > > Best, > > Alexander Fedulov > > > > On Mon, Jun 6, 2022 at 7:08 PM Austin Cawley-Edwards < > > [email protected]> wrote: > > > > > Thanks for the nice discussion all. > > > > > > I was recently trying to implement a very simple polling source and > > > would've loved a higher-level base to work from. I'm wondering if in > > > addition to the data generator use cases, it would be good to support a > > > simple non-parallel polling abstraction to make it easier to, for > > instance, > > > start prototyping with data in existing APIs without adding a Kafka or > > such > > > in the middle. > > > > > > Best, > > > Austin > > > > > > On Mon, Jun 6, 2022 at 10:02 AM tison <[email protected]> wrote: > > > > > > > Well. It's a bit off-topic. For deprecating SourceFunction as FLIP-27 > > > > series works go ahead, +1 from my side. It's a significant work > towards > > > the > > > > unification of batch and streaming effort :) > > > > > > > > Best, > > > > tison. > > > > > > > > > > > > tison <[email protected]> 于2022年6月6日周一 21:54写道: > > > > > > > > > The starting point of the version bump and removal question is that > > > > > downstream projects may experience a tough time to adapt new > > interfaces > > > > > while Flink keeps in 1.x versions so that users may expect it as an > > > easy > > > > > task. From my experience, it's really challenge to maintain > > > > > compatibility between multiple versions of Flink while significant > > > > changes > > > > > made but sharing 1.x version series - users may not be aware that > > it's > > > > > almost a major version bump. > > > > > > > > > > Best, > > > > > tison. > > > > > > > > > > > > > > > tison <[email protected]> 于2022年6月6日周一 21:51写道: > > > > > > > > > >> One question from my side: > > > > >> > > > > >> As SourceFunction a @Public interface, we cannot remove it before > > > doing > > > > >> a major version bump (Flink 2.0). > > > > >> > > > > >> Of course it's not a blocker to make such deprecation and let the > > new > > > > >> interface step in. My question is whether we have a plan to > finally > > > > remove > > > > >> the deprecated interfaces, or postpone it until a clear plan of > > Flink > > > > 2.0? > > > > >> > > > > >> Best, > > > > >> tison. > > > > >> > > > > >> > > > > >> David Anderson <[email protected]> 于2022年6月6日周一 21:35写道: > > > > >> > > > > >>> > > > > > >>> > David, can you elaborate why you need watermark generation in > the > > > > >>> source > > > > >>> > for your data generators? > > > > >>> > > > > >>> > > > > >>> The training exercises should strive to provide examples of best > > > > >>> practices. > > > > >>> If the exercises and their solutions use > > > > >>> > > > > >>> env.fromSource(source, WatermarkStrategy.noWatermarks(), > > > > >>> "name-of-source") > > > > >>> .map(...) > > > > >>> .assignTimestampsAndWatermarks(...) > > > > >>> > > > > >>> this will help establish this anti-pattern as the normal way of > > doing > > > > >>> things. > > > > >>> > > > > >>> Most new Flink users are using a KafkaSource with a noWatermarks > > > > strategy > > > > >>> and a SimpleStringSchema, followed by a map that does the real > > > > >>> deserialization, followed by the real watermarking -- because > they > > > > aren't > > > > >>> seeing examples that teach how these interfaces are meant to be > > used. > > > > >>> > > > > >>> When we redo the sources used in training exercises, I want to > > avoid > > > > >>> these > > > > >>> pitfalls. > > > > >>> > > > > >>> David > > > > >>> > > > > >>> On Mon, Jun 6, 2022 at 9:12 AM Konstantin Knauf < > [email protected] > > > > > > > >>> wrote: > > > > >>> > > > > >>> > Hi everyone, > > > > >>> > > > > > >>> > very interesting thread. The proposal for deprecation seems to > > have > > > > >>> sparked > > > > >>> > a very important discussion. Do we what users struggle with > > > > >>> specifically? > > > > >>> > > > > > >>> > Speaking for myself, when I upgrade flink-faker to the new > Source > > > API > > > > >>> an > > > > >>> > unbounded version of the NumberSequenceSource would have been > > all I > > > > >>> needed, > > > > >>> > but that's just the data generator use case. I think, that one > > > could > > > > be > > > > >>> > solved quite easily. David, can you elaborate why you need > > > watermark > > > > >>> > generation in the source for your data generators? > > > > >>> > > > > > >>> > Cheers, > > > > >>> > > > > > >>> > Konstantin > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > > > > > >>> > Am So., 5. Juni 2022 um 17:48 Uhr schrieb Piotr Nowojski < > > > > >>> > [email protected]>: > > > > >>> > > > > > >>> > > Also +1 to what David has written. But it doesn't mean we > > should > > > be > > > > >>> > waiting > > > > >>> > > indefinitely to deprecate SourceFunction. > > > > >>> > > > > > > >>> > > Best, > > > > >>> > > Piotrek > > > > >>> > > > > > > >>> > > niedz., 5 cze 2022 o 16:46 Jark Wu <[email protected]> > > > napisał(a): > > > > >>> > > > > > > >>> > > > +1 to David's point. > > > > >>> > > > > > > > >>> > > > Usually, when we deprecate some interfaces, we should point > > > users > > > > >>> to > > > > >>> > use > > > > >>> > > > the recommended alternatives. > > > > >>> > > > However, implementing the new Source interface for some > > simple > > > > >>> > scenarios > > > > >>> > > is > > > > >>> > > > too challenging and complex. > > > > >>> > > > We also found it isn't easy to push the internal connector > to > > > > >>> upgrade > > > > >>> > to > > > > >>> > > > the new Source because > > > > >>> > > > "FLIP-27 are hard to understand, while SourceFunction is > > easy". > > > > >>> > > > > > > > >>> > > > +1 to make implementing a simple Source easier before > > > deprecating > > > > >>> > > > SourceFunction. > > > > >>> > > > > > > > >>> > > > Best, > > > > >>> > > > Jark > > > > >>> > > > > > > > >>> > > > > > > > >>> > > > On Sun, 5 Jun 2022 at 07:29, Jingsong Lee < > > > > [email protected] > > > > >>> > > > > > >>> > > wrote: > > > > >>> > > > > > > > >>> > > > > +1 to David and Ingo. > > > > >>> > > > > > > > > >>> > > > > Before deprecate and remove SourceFunction, we should > have > > > some > > > > >>> > easier > > > > >>> > > > APIs > > > > >>> > > > > to wrap new Source, the cost to write a new Source is too > > > high > > > > >>> now. > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > > > > > >>> > > > > Ingo Bürk <[email protected]>于2022年6月5日 周日05:32写道: > > > > >>> > > > > > > > > >>> > > > > > I +1 everything David said. The new Source API raised > the > > > > >>> > complexity > > > > >>> > > > > > significantly. It's great to have such a rich, powerful > > API > > > > >>> that > > > > >>> > can > > > > >>> > > do > > > > >>> > > > > > everything, but in the process we lost the ability to > > > onboard > > > > >>> > people > > > > >>> > > to > > > > >>> > > > > > the APIs. > > > > >>> > > > > > > > > > >>> > > > > > > > > > >>> > > > > > Best > > > > >>> > > > > > Ingo > > > > >>> > > > > > > > > > >>> > > > > > On 04.06.22 21:21, David Anderson wrote: > > > > >>> > > > > > > I'm in favor of this, but I think we need to make it > > > easier > > > > >>> to > > > > >>> > > > > implement > > > > >>> > > > > > > data generators and test sources. As things stand in > > > 1.15, > > > > >>> unless > > > > >>> > > you > > > > >>> > > > > can > > > > >>> > > > > > > be satisfied with using a NumberSequenceSource > followed > > > by > > > > a > > > > >>> map, > > > > >>> > > > > things > > > > >>> > > > > > > get quite complicated. I looked into reworking the > data > > > > >>> > generators > > > > >>> > > > used > > > > >>> > > > > > in > > > > >>> > > > > > > the training exercises, and got discouraged by the > > amount > > > > of > > > > >>> work > > > > >>> > > > > > involved. > > > > >>> > > > > > > (The sources used in the training want to be > unbounded, > > > and > > > > >>> need > > > > >>> > > > > > > watermarking in the sources, which means that using > > > > >>> > > > > NumberSequenceSource > > > > >>> > > > > > > isn't an option.) > > > > >>> > > > > > > > > > > >>> > > > > > > I think the proposed deprecation will be better > > received > > > if > > > > >>> it > > > > >>> > can > > > > >>> > > be > > > > >>> > > > > > > accompanied by something that makes implementing a > > simple > > > > >>> Source > > > > >>> > > > easier > > > > >>> > > > > > > than it is now. People are continuing to implement > new > > > > >>> > > > SourceFunctions > > > > >>> > > > > > > because the interfaces defined by FLIP-27 are hard to > > > > >>> understand, > > > > >>> > > > while > > > > >>> > > > > > > SourceFunction is easy. Alex, I believe you were > > looking > > > > into > > > > >>> > > > > > implementing > > > > >>> > > > > > > an easier-to-use building block that could be used in > > > > >>> situations > > > > >>> > > like > > > > >>> > > > > > this. > > > > >>> > > > > > > Can we get something like that in place first? > > > > >>> > > > > > > > > > > >>> > > > > > > David > > > > >>> > > > > > > > > > > >>> > > > > > > On Fri, Jun 3, 2022 at 4:52 PM Jing Ge < > > > [email protected] > > > > > > > > > >>> > wrote: > > > > >>> > > > > > > > > > > >>> > > > > > >> Hi, > > > > >>> > > > > > >> > > > > >>> > > > > > >> Thanks Alex for driving this! > > > > >>> > > > > > >> > > > > >>> > > > > > >> +1 To give the Flink developers, especially > Connector > > > > >>> developers > > > > >>> > > the > > > > >>> > > > > > clear > > > > >>> > > > > > >> signal that the new Source API is recommended > > according > > > to > > > > >>> > > FLIP-27, > > > > >>> > > > we > > > > >>> > > > > > >> should mark them as deprecated. > > > > >>> > > > > > >> > > > > >>> > > > > > >> There are some open questions to discuss: > > > > >>> > > > > > >> > > > > >>> > > > > > >> 1. Do we need to mark all subinterfaces/subclasses > as > > > > >>> > deprecated? > > > > >>> > > > e.g. > > > > >>> > > > > > >> FromElementsFunction, etc. there are many. What are > > the > > > > >>> > > > replacements? > > > > >>> > > > > > >> 2. Do we need to mark all subclasses that have > > > replacement > > > > >>> as > > > > >>> > > > > > deprecated? > > > > >>> > > > > > >> e.g. ExternallyInducedSource whose replacement > class, > > > if I > > > > >>> am > > > > >>> > not > > > > >>> > > > > > mistaken, > > > > >>> > > > > > >> ExternallyInducedSourceReader is @Experimental > > > > >>> > > > > > >> 3. Do we need to mark all related test utility > classes > > > as > > > > >>> > > > deprecated? > > > > >>> > > > > > >> > > > > >>> > > > > > >> I think it might make sense to create an umbrella > > ticket > > > > to > > > > >>> > cover > > > > >>> > > > all > > > > >>> > > > > of > > > > >>> > > > > > >> these with the following process: > > > > >>> > > > > > >> > > > > >>> > > > > > >> 1. Mark SourceFunction as deprecated asap. > > > > >>> > > > > > >> 2. Mark subinterfaces and subclasses as deprecated, > if > > > > >>> there are > > > > >>> > > > > > graduated > > > > >>> > > > > > >> replacements. Good example is that KafkaSource > > replaced > > > > >>> > > > KafkaConsumer > > > > >>> > > > > > which > > > > >>> > > > > > >> has been marked as deprecated. > > > > >>> > > > > > >> 3. Do not mark subinterfaces and subclasses as > > > deprecated, > > > > >>> if > > > > >>> > > > > > replacement > > > > >>> > > > > > >> classes are still experimental, check if it is time > to > > > > >>> graduate > > > > >>> > > > them. > > > > >>> > > > > > After > > > > >>> > > > > > >> graduation, go to step 2. It might take a while for > > > > >>> graduation. > > > > >>> > > > > > >> 4. Do not mark subinterfaces and subclasses as > > > deprecated, > > > > >>> if > > > > >>> > the > > > > >>> > > > > > >> replacement classes are experimental and are too > young > > > to > > > > >>> > > graduate. > > > > >>> > > > We > > > > >>> > > > > > have > > > > >>> > > > > > >> to wait. But in this case we could create new > tickets > > > > under > > > > >>> the > > > > >>> > > > > umbrella > > > > >>> > > > > > >> ticket. > > > > >>> > > > > > >> 5. Do not mark subinterfaces and subclasses as > > > deprecated, > > > > >>> if > > > > >>> > > there > > > > >>> > > > is > > > > >>> > > > > > no > > > > >>> > > > > > >> replacement at all. We have to create new tickets > and > > > wait > > > > >>> until > > > > >>> > > the > > > > >>> > > > > new > > > > >>> > > > > > >> implementation has been done and graduated. It will > > > take a > > > > >>> > longer > > > > >>> > > > > time, > > > > >>> > > > > > >> roughly 1,5 years. > > > > >>> > > > > > >> 6. For test classes, we could follow the same rule. > > But > > > I > > > > >>> think > > > > >>> > > for > > > > >>> > > > > some > > > > >>> > > > > > >> cases, we could consider doing the replacement > > directly > > > > >>> without > > > > >>> > > > going > > > > >>> > > > > > >> through the deprecation phase. > > > > >>> > > > > > >> > > > > >>> > > > > > >> When we look back on all of these, we can realize it > > is > > > a > > > > >>> big > > > > >>> > epic > > > > >>> > > > > (even > > > > >>> > > > > > >> bigger than an epic). It needs someone to drive it > and > > > > keep > > > > >>> > focus > > > > >>> > > on > > > > >>> > > > > it > > > > >>> > > > > > >> continuously with support from the community and > push > > > the > > > > >>> > > > development > > > > >>> > > > > > >> towards the new Source API of FLIP-27. > > > > >>> > > > > > >> > > > > >>> > > > > > >> If we could have consensus for this, Alex and I > could > > > > >>> create > > > > >>> > the > > > > >>> > > > > > umbrella > > > > >>> > > > > > >> ticket to kick it off. > > > > >>> > > > > > >> > > > > >>> > > > > > >> Best regards, > > > > >>> > > > > > >> Jing > > > > >>> > > > > > >> > > > > >>> > > > > > >> > > > > >>> > > > > > >> On Fri, Jun 3, 2022 at 3:54 PM Alexander Fedulov < > > > > >>> > > > > > [email protected]> > > > > >>> > > > > > >> wrote: > > > > >>> > > > > > >> > > > > >>> > > > > > >>> Hi everyone, > > > > >>> > > > > > >>> > > > > >>> > > > > > >>> I would like to start the discussion about marking > > > > >>> > > > > SourceFunction-based > > > > >>> > > > > > >>> interfaces as deprecated. With the FLIP-27 APIs > > > becoming > > > > >>> the > > > > >>> > new > > > > >>> > > > > > >> standard, > > > > >>> > > > > > >>> the old ones have to be eventually phased out. > > Although > > > > >>> this > > > > >>> > > state > > > > >>> > > > is > > > > >>> > > > > > >> well > > > > >>> > > > > > >>> known within the community and no new connectors > > based > > > on > > > > >>> the > > > > >>> > old > > > > >>> > > > > > >>> interfaces can be accepted into the project, the > > > > footprint > > > > >>> of > > > > >>> > > > > > >>> SourceFunction in the user code still keeps growing > > > > >>> (primarily > > > > >>> > > for > > > > >>> > > > > data > > > > >>> > > > > > >>> generators and test utilities). I believe it is > best > > to > > > > >>> mark > > > > >>> > > > > > >> SourceFunction > > > > >>> > > > > > >>> as deprecated as soon as possible. What do you > think? > > > > >>> > > > > > >>> > > > > >>> > > > > > >>> Best, > > > > >>> > > > > > >>> Alexander Fedulov > > > > >>> > > > > > >>> > > > > >>> > > > > > >> > > > > >>> > > > > > > > > > > >>> > > > > > > > > > >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > >>> > > > > > >>> > > > > > >>> > -- > > > > >>> > https://twitter.com/snntrable > > > > >>> > https://github.com/knaufk > > > > >>> > > > > > >>> > > > > >> > > > > > > > > > >
