[
https://issues.apache.org/jira/browse/BEAM-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733426#comment-16733426
]
Raghu Angadi commented on BEAM-6333:
------------------------------------
> Would it be correct to say that the mechanism doesn't account for potential
> clockdrift between the broker and the beam node?
Yes. 2 seconds is expected to cover things like that. The real fix is for Kafka
to provide its server side timestamp.
Regd determinism in your case, do those two pipelines read from same Kafka
cluster or two different ones? If it is from separate clusters, do you tightly
control the timestamps? If it is the same cluster, reprocessing should have the
same timestamps.
Overall, it looks like yours is a specific enough case and you can have your
own implementation that does the right thing. Watermark is a pretty hard for
causual users to be concerned. Adding more options may not help much. The
current default implantation does the right thing and handles idle partitions
well. There is nothing worse than user's having to debug stuck pipelines due to
obscure watermark stuckness due to something like an idle partition... often
they are not experts on Kafka or how the input is processed. More advanced
users like yourselves can utilize the APIs and do the right thing.
Let us know if you still want to have an option to disable it. More user
accessible options implies more perceived complexity.
> allow disabling/configuring automatic watermark generation for idle kafka
> partitions
> ------------------------------------------------------------------------------------
>
> Key: BEAM-6333
> URL: https://issues.apache.org/jira/browse/BEAM-6333
> Project: Beam
> Issue Type: Improvement
> Components: io-java-kafka
> Affects Versions: 2.9.0
> Reporter: Jan Doms
> Assignee: Raghu Angadi
> Priority: Major
> Labels: easyfix, pull-request-available
> Original Estimate: 0.5h
> Time Spent: 10m
> Remaining Estimate: 20m
>
> For pipelines that require the emitted watermarks to be absolutely correct
> the current behavior using 2 seconds timeout could lead to problems.
> Therefor it should be possible to disable this behavior. While changing the
> code the hardcoded 2 seconds can also be made configurable. (There's already
> a TODO about that in the code.)
>
> A (preliminary) PR making this change is available:
> https://github.com/apache/beam/pull/7382.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)