[
https://issues.apache.org/jira/browse/BEAM-6751?focusedWorklogId=205826&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-205826
]
ASF GitHub Bot logged work on BEAM-6751:
----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Feb/19 14:54
Start Date: 28/Feb/19 14:54
Worklog Time Spent: 10m
Work Description: mxm commented on issue #7955: [BEAM-6751] Extend Kafka
EOS mode whitelist / Warn instead of throw
URL: https://github.com/apache/beam/pull/7955#issuecomment-468302127
You are correct, Flink's KafkaProducer starts a new transaction for every
checkpoint and only acknowledges the transaction upon completion of the
checkpoint. However, we do not replace KafkaIO with Flink's KafkaProducer
because it seems impracticable to map its configuration to Flink's.
After taking a closer look at Beam's `KafkaExactlyOnceSink` it does not look
like we can achieve exactly-once semantics with the Flink Runner. We can
potentially process elements multiple times if we restore from a checkpoint.
That is unfortunate because Flink has all the building blocks to ensure
exactly once. Also hard to sell to Beam users that are used to exactly once in
Flink's KafkaProducer.
A way to fix this would be to provide an (optional) hook in DoFn for
committing pending work. I suppose SDF would have something like this.
Alternatively, we might have to consider translating KafkaIO to Flink's
KafkaProducer.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 205826)
Time Spent: 0.5h (was: 20m)
> KafkaIO blocks FlinkRunner in EOS mode
> --------------------------------------
>
> Key: BEAM-6751
> URL: https://issues.apache.org/jira/browse/BEAM-6751
> Project: Beam
> Issue Type: Bug
> Components: io-java-kafka, runner-flink
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: Critical
> Fix For: 2.12.0
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> KafkaIO has a validation check which whitelists certain runners capable of
> provide exactly-once semantics:
> {noformat}
> if ("org.apache.beam.runners.direct.DirectRunner".equals(runner)
> || runner.startsWith("org.apache.beam.runners.dataflow.")
> || runner.startsWith("org.apache.beam.runners.spark.") {
> ...
> {noformat}
> The FlinkRunner supports exactly-once checkpointing but is blocked from using
> Kafka's exactly once mode.
> I wonder if such a list is easily maintainable? I think we should replace the
> list with a warning instead.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)