[jira] [Work logged] (BEAM-6751) KafkaIO blocks FlinkRunner in EOS mode

ASF GitHub Bot (JIRA) Thu, 28 Feb 2019 07:52:17 -0800


     [ 
https://issues.apache.org/jira/browse/BEAM-6751?focusedWorklogId=205847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-205847
 ]


ASF GitHub Bot logged work on BEAM-6751:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Feb/19 15:51
            Start Date: 28/Feb/19 15:51
    Worklog Time Spent: 10m 
      Work Description: kennknowles commented on issue #7955: [BEAM-6751] 
Extend Kafka EOS mode whitelist / Warn instead of throw
URL: https://github.com/apache/beam/pull/7955#issuecomment-468324672
 
 
   Processing an element again is unavoidable in the face of failure, so do you 
mean that the resulting processing will cause duplicate (non-idempotent) 
externally visible effects?
   
   I just read up on [Flink's exactly-once 
KafkaProducer](https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html)
 so I can contribute more effectively to this conversation.
   
   There, it reads "during normal working of Flink applications, user can 
expect a delay in visibility of the records produced into Kafka topics, equal 
to average time between completed checkpoints." That is exactly the expected 
cost of implementing `@RequiresStableInput`. I had thought this was untenable 
for Flink because my (out of date) understanding was that in large scale use 
checkpoints were set to be exceedingly rare.
   
   The historical cause, which I think you know, is that many Beam IOs were 
designed before Beam and/or around Dataflow, where you can do things like 
generate random numbers, shuffle to save the result, then use them and know 
that they will not be regenerated upon retry. The SparkRunner explicitly 
materializes after GBK to simulate this (at great cost, I would imagine). The 
mindset is hard to shake, and lots of people (myself included) don't have a 
deep understanding of all runners. We probably need to have an explicit push to 
fix Beam's IOs. It would be great to have a generic way to produce integration 
tests with failures that could be applied to any runner, to catch erroneous 
assumptions around durability.
   
   I don't quite understand your last point, about having a callback on a 
`DoFn` to commit pending work. Theres `@FinishBundle` which must be called 
before elements are considered processed. I think you mean kind of the 
converse, like an `OK now it is durable so you can output other things` 
callback? I believe this is more-or-less what `@RequiresStableInput` achieves, 
or anyhow what it is hoping to achieve. I can't tell if this problem is 
fundamental and the model needs an extension (which I'm super OK with) or if it 
is just the way IOs are written.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 205847)
    Time Spent: 40m  (was: 0.5h)

> KafkaIO blocks FlinkRunner in EOS mode
> --------------------------------------
>
>                 Key: BEAM-6751
>                 URL: https://issues.apache.org/jira/browse/BEAM-6751
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-kafka, runner-flink
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Critical
>             Fix For: 2.12.0
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> KafkaIO has a validation check which whitelists certain runners capable of 
> provide exactly-once semantics:
> {noformat}
>         if ("org.apache.beam.runners.direct.DirectRunner".equals(runner)
>             || runner.startsWith("org.apache.beam.runners.dataflow.")
>             || runner.startsWith("org.apache.beam.runners.spark.") {
> ...
> {noformat}
> The FlinkRunner supports exactly-once checkpointing but is blocked from using 
> Kafka's exactly once mode. 
> I wonder if such a list is easily maintainable? I think we should replace the 
> list with a warning instead. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Work logged] (BEAM-6751) KafkaIO blocks FlinkRunner in EOS mode

Reply via email to