I am also for having only one source interface. It seems that interruptability is to much of a burden on the sources, locking version should be still acceptable from the user point of view. We are dealing with inherently concurrent tasks, I suppose our users are familiar with locking - especially the ones in need for exactly once processing.
On Sat, May 30, 2015 at 2:44 AM, Aljoscha Krettek <aljos...@apache.org> wrote: > I would also prefer having only one source. The PR still has both > variants so that people can check them out. > > In my opinion the assumptions about interruptibility are easier to > break than the requirement of locking. Even if we get the kafka source > to work with the interruptions (which I doubt, because this fails > somewhere in their code) this would not guarantee that this will > always work in future versions. With the locking you either have the > locking, then it is correct (even for feature versions) or you don't, > then it is immediately incorrect. > > On Fri, May 29, 2015 at 10:56 PM, Gyula Fóra <gyula.f...@gmail.com> wrote: > > Hey, > > > > It seems like both interfaces are pretty much capable of doing the same > > thing but work on slightly different assumptions. > > > > Isn't there a way that the kafka source can work with the interruptions? > I > > think the reachedEnd/next interface is slightly easier to grasp than the > > run() with the locks. But in any case I would slightly prefer having only > > one of them if they can technically do the same thing. > > > > Also adding a new interface means we add a new streamtask implementation > > which is also getting slightly too much. > > > > What is you opinion on this? > > > > Gyula > > > > > > > > On Fri, May 29, 2015 at 6:51 PM, Aljoscha Krettek <aljos...@apache.org> > > wrote: > > > >> Hi All, > >> after finishing my pull request that should fix the problems with the > >> synchronisation of checkpoints and element emission (the reason for > >> the faulty results of the exactly-once tests) I discovered that the > >> Kafka source does not deal well with being interrupted. We recently > >> changed the SourceFunction to the reachedEnd()/next() interface, with > >> the contract that the source must be interruptible to be able to > >> perform checkpoints. Now this doesn't seem to work with Kafka. I added > >> another Source interface in my PR > >> (https://github.com/apache/flink/pull/742). This is similar to the old > >> interface of run()/cancel(), with the addition that the source must > >> acquire a lock before updating state and emitting elements. The update > >> of state and the emission of elements must happen in the same > >> synchronized block to ensure consistency. This seems to solve the > >> problem but now we have two source interfaces. > >> > >> The question is now. What do you think about the two interfaces? > >> Should we keep both? Remove one? > >> > >> Cheers, > >> Aljoscha > >> >