[ 
https://issues.apache.org/jira/browse/BEAM-6751?focusedWorklogId=205826&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-205826
 ]

ASF GitHub Bot logged work on BEAM-6751:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Feb/19 14:54
            Start Date: 28/Feb/19 14:54
    Worklog Time Spent: 10m 
      Work Description: mxm commented on issue #7955: [BEAM-6751] Extend Kafka 
EOS mode whitelist / Warn instead of throw
URL: https://github.com/apache/beam/pull/7955#issuecomment-468302127
 
 
   You are correct, Flink's KafkaProducer starts a new transaction for every 
checkpoint and only acknowledges the transaction upon completion of the 
checkpoint. However, we do not replace KafkaIO with Flink's KafkaProducer 
because it seems impracticable to map its configuration to Flink's.
   
   After taking a closer look at Beam's `KafkaExactlyOnceSink` it does not look 
like we can achieve exactly-once semantics with the Flink Runner. We can 
potentially process elements multiple times if we restore from a checkpoint. 
   
   That is unfortunate because Flink has all the building blocks to ensure 
exactly once. Also hard to sell to Beam users that are used to exactly once in 
Flink's KafkaProducer.
   
   A way to fix this would be to provide an (optional) hook in DoFn for 
committing pending work. I suppose SDF would have something like this. 
Alternatively, we might have to consider translating KafkaIO to Flink's 
KafkaProducer.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 205826)
    Time Spent: 0.5h  (was: 20m)

> KafkaIO blocks FlinkRunner in EOS mode
> --------------------------------------
>
>                 Key: BEAM-6751
>                 URL: https://issues.apache.org/jira/browse/BEAM-6751
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-kafka, runner-flink
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Critical
>             Fix For: 2.12.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> KafkaIO has a validation check which whitelists certain runners capable of 
> provide exactly-once semantics:
> {noformat}
>         if ("org.apache.beam.runners.direct.DirectRunner".equals(runner)
>             || runner.startsWith("org.apache.beam.runners.dataflow.")
>             || runner.startsWith("org.apache.beam.runners.spark.") {
> ...
> {noformat}
> The FlinkRunner supports exactly-once checkpointing but is blocked from using 
> Kafka's exactly once mode. 
> I wonder if such a list is easily maintainable? I think we should replace the 
> list with a warning instead. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to