[
https://issues.apache.org/jira/browse/BEAM-6751?focusedWorklogId=205847&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-205847
]
ASF GitHub Bot logged work on BEAM-6751:
----------------------------------------
Author: ASF GitHub Bot
Created on: 28/Feb/19 15:51
Start Date: 28/Feb/19 15:51
Worklog Time Spent: 10m
Work Description: kennknowles commented on issue #7955: [BEAM-6751]
Extend Kafka EOS mode whitelist / Warn instead of throw
URL: https://github.com/apache/beam/pull/7955#issuecomment-468324672
Processing an element again is unavoidable in the face of failure, so do you
mean that the resulting processing will cause duplicate (non-idempotent)
externally visible effects?
I just read up on [Flink's exactly-once
KafkaProducer](https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kafka.html)
so I can contribute more effectively to this conversation.
There, it reads "during normal working of Flink applications, user can
expect a delay in visibility of the records produced into Kafka topics, equal
to average time between completed checkpoints." That is exactly the expected
cost of implementing `@RequiresStableInput`. I had thought this was untenable
for Flink because my (out of date) understanding was that in large scale use
checkpoints were set to be exceedingly rare.
The historical cause, which I think you know, is that many Beam IOs were
designed before Beam and/or around Dataflow, where you can do things like
generate random numbers, shuffle to save the result, then use them and know
that they will not be regenerated upon retry. The SparkRunner explicitly
materializes after GBK to simulate this (at great cost, I would imagine). The
mindset is hard to shake, and lots of people (myself included) don't have a
deep understanding of all runners. We probably need to have an explicit push to
fix Beam's IOs. It would be great to have a generic way to produce integration
tests with failures that could be applied to any runner, to catch erroneous
assumptions around durability.
I don't quite understand your last point, about having a callback on a
`DoFn` to commit pending work. Theres `@FinishBundle` which must be called
before elements are considered processed. I think you mean kind of the
converse, like an `OK now it is durable so you can output other things`
callback? I believe this is more-or-less what `@RequiresStableInput` achieves,
or anyhow what it is hoping to achieve. I can't tell if this problem is
fundamental and the model needs an extension (which I'm super OK with) or if it
is just the way IOs are written.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 205847)
Time Spent: 40m (was: 0.5h)
> KafkaIO blocks FlinkRunner in EOS mode
> --------------------------------------
>
> Key: BEAM-6751
> URL: https://issues.apache.org/jira/browse/BEAM-6751
> Project: Beam
> Issue Type: Bug
> Components: io-java-kafka, runner-flink
> Reporter: Maximilian Michels
> Assignee: Maximilian Michels
> Priority: Critical
> Fix For: 2.12.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> KafkaIO has a validation check which whitelists certain runners capable of
> provide exactly-once semantics:
> {noformat}
> if ("org.apache.beam.runners.direct.DirectRunner".equals(runner)
> || runner.startsWith("org.apache.beam.runners.dataflow.")
> || runner.startsWith("org.apache.beam.runners.spark.") {
> ...
> {noformat}
> The FlinkRunner supports exactly-once checkpointing but is blocked from using
> Kafka's exactly once mode.
> I wonder if such a list is easily maintainable? I think we should replace the
> list with a warning instead.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)