Re: [DISCUSS] Streaming Sources (again)

Gyula Fóra Fri, 29 May 2015 13:58:33 -0700

Hey,

It seems like both interfaces are pretty much capable of doing the same
thing but work on slightly different assumptions.

Isn't there a way that the kafka source can work with the interruptions? I
think the reachedEnd/next interface is slightly easier to grasp than the
run() with the locks. But in any case I would slightly prefer having only
one of them if they can technically do the same thing.

Also adding a new interface means we add a new streamtask implementation
which is also getting slightly too much.

What is you opinion on this?

Gyula

On Fri, May 29, 2015 at 6:51 PM, Aljoscha Krettek <aljos...@apache.org>
wrote:

> Hi All,
> after finishing my pull request that should fix the problems with the
> synchronisation of checkpoints and element emission (the reason for
> the faulty results of the exactly-once tests) I discovered that the
> Kafka source does not deal well with being interrupted. We recently
> changed the SourceFunction to the reachedEnd()/next() interface, with
> the contract that the source must be interruptible to be able to
> perform checkpoints. Now this doesn't seem to work with Kafka. I added
> another Source interface in my PR
> (https://github.com/apache/flink/pull/742). This is similar to the old
> interface of run()/cancel(), with the addition that the source must
> acquire a lock before updating state and emitting elements. The update
> of state and the emission of elements must happen in the same
> synchronized block to ensure consistency. This seems to solve the
> problem but now we have two source interfaces.
>
> The question is now. What do you think about the two interfaces?
> Should we keep both? Remove one?
>
> Cheers,
> Aljoscha
>

Re: [DISCUSS] Streaming Sources (again)

Reply via email to