I am also for having only one source interface. It seems that
interruptability is to much of a burden on the sources, locking version
should be still acceptable from the user point of view. We are dealing with
inherently concurrent tasks, I suppose our users are familiar with locking
- especially the ones in need for exactly once processing.

On Sat, May 30, 2015 at 2:44 AM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> I would also prefer having only one source. The PR still has both
> variants so that people can check them out.
>
> In my opinion the assumptions about interruptibility are easier to
> break than the requirement of locking. Even if we get the kafka source
> to work with the interruptions (which I doubt, because this fails
> somewhere in their code) this would not guarantee that this will
> always work in future versions. With the locking you either have the
> locking, then it is correct (even for feature versions) or you don't,
> then it is immediately incorrect.
>
> On Fri, May 29, 2015 at 10:56 PM, Gyula Fóra <gyula.f...@gmail.com> wrote:
> > Hey,
> >
> > It seems like both interfaces are pretty much capable of doing the same
> > thing but work on slightly different assumptions.
> >
> > Isn't there a way that the kafka source can work with the interruptions?
> I
> > think the reachedEnd/next interface is slightly easier to grasp than the
> > run() with the locks. But in any case I would slightly prefer having only
> > one of them if they can technically do the same thing.
> >
> > Also adding a new interface means we add a new streamtask implementation
> > which is also getting slightly too much.
> >
> > What is you opinion on this?
> >
> > Gyula
> >
> >
> >
> > On Fri, May 29, 2015 at 6:51 PM, Aljoscha Krettek <aljos...@apache.org>
> > wrote:
> >
> >> Hi All,
> >> after finishing my pull request that should fix the problems with the
> >> synchronisation of checkpoints and element emission (the reason for
> >> the faulty results of the exactly-once tests) I discovered that the
> >> Kafka source does not deal well with being interrupted. We recently
> >> changed the SourceFunction to the reachedEnd()/next() interface, with
> >> the contract that the source must be interruptible to be able to
> >> perform checkpoints. Now this doesn't seem to work with Kafka. I added
> >> another Source interface in my PR
> >> (https://github.com/apache/flink/pull/742). This is similar to the old
> >> interface of run()/cancel(), with the addition that the source must
> >> acquire a lock before updating state and emitting elements. The update
> >> of state and the emission of elements must happen in the same
> >> synchronized block to ensure consistency. This seems to solve the
> >> problem but now we have two source interfaces.
> >>
> >> The question is now. What do you think about the two interfaces?
> >> Should we keep both? Remove one?
> >>
> >> Cheers,
> >> Aljoscha
> >>
>

Reply via email to