That's awesome! I will look into it more this weekend. On Tue, Aug 8, 2017 at 8:22 AM, Aljoscha Krettek <aljos...@apache.org> wrote:
> Hi, > > In Flink, there is a TwoPhaseCommit SinkFunction that can be used for such > cases: [1]. The PR for a Kafka 0.11 exactly once producer builds on that: > [2] > > Best, > Aljoscha > > [1] https://github.com/apache/flink/blob/62e99918a45b7215c099fbcf160d45 > aa02d4559e/flink-streaming-java/src/main/java/org/apache/ > flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.java#L55 < > https://github.com/apache/flink/blob/62e99918a45b7215c099fbcf160d45 > aa02d4559e/flink-streaming-java/src/main/java/org/apache/ > flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.java#L55> > [2] https://github.com/apache/flink/pull/4239 > > On 3. Aug 2017, at 04:03, Raghu Angadi <rang...@google.com.INVALID> > wrote: > > > > Kafka 0.11 added support for transactions[1], which allows end-to-end > > exactly-once semantics. Beam's KafkaIO users can benefit from these while > > using runners that support exactly-once processing. > > > > I have an implementation of EOS support for Kafka sink : > > https://github.com/apache/beam/pull/3612 > > It has two shuffles and builds on Beam state-API and checkpoint barrier > > between stages (as in Dataflow). Pull request has a longer description. > > > > - What other runners in addition to Dataflow would be compatible with > such > > a strategy? > > - I think it does not quite work for Flink (as it has a global > checkpoint, > > not between the stages). How would one go about implementing such a sink. > > > > Any comments on the pull request are also welcome. > > > > Thanks, > > Raghu. > > > > [1] > > https://www.confluent.io/blog/exactly-once-semantics-are- > possible-heres-how-apache-kafka-does-it/ > >