Kafka 0.11 added support for transactions[1], which allows end-to-end
exactly-once semantics. Beam's KafkaIO users can benefit from these while
using runners that support exactly-once processing.

I have an implementation of EOS support for Kafka sink :
https://github.com/apache/beam/pull/3612
It has two shuffles and builds on Beam state-API and checkpoint barrier
between stages (as in Dataflow). Pull request has a longer description.

- What other runners in addition to Dataflow would be compatible with such
a strategy?
- I think it does not quite work for Flink (as it has a global checkpoint,
not between the stages). How would one go about implementing such a sink.

Any comments on the pull request are also welcome.

Thanks,
Raghu.

[1]
https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/

Reply via email to