[ 
https://issues.apache.org/jira/browse/BEAM-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733426#comment-16733426
 ] 

Raghu Angadi commented on BEAM-6333:
------------------------------------

> Would it be correct to say that the mechanism doesn't account for potential 
> clockdrift between the broker and the beam node?

Yes. 2 seconds is expected to cover things like that. The real fix is for Kafka 
to provide its server side timestamp.

Regd determinism in your case,  do those two pipelines read from same Kafka 
cluster or two different ones? If it is from separate clusters, do you tightly 
control the timestamps? If it is the same cluster, reprocessing should have the 
same timestamps.

Overall, it looks like yours is a specific enough case and you can have your 
own implementation that does the right thing. Watermark is a pretty hard for 
causual users to be concerned. Adding more options may not help much. The 
current default implantation does the right thing and handles idle partitions 
well. There is nothing worse than user's having to debug stuck pipelines due to 
obscure watermark stuckness due to something like an idle partition... often 
they are not experts on Kafka or how the input is processed.  More advanced 
users like yourselves can utilize the APIs and do the right thing.

Let us know if you still want to have an option to disable it. More user 
accessible options implies more perceived complexity. 


> allow disabling/configuring automatic watermark generation for idle kafka 
> partitions
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-6333
>                 URL: https://issues.apache.org/jira/browse/BEAM-6333
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-kafka
>    Affects Versions: 2.9.0
>            Reporter: Jan Doms
>            Assignee: Raghu Angadi
>            Priority: Major
>              Labels: easyfix, pull-request-available
>   Original Estimate: 0.5h
>          Time Spent: 10m
>  Remaining Estimate: 20m
>
> For pipelines that require the emitted watermarks to be absolutely correct 
> the current behavior using 2 seconds timeout could lead to problems.
> Therefor it should be possible to disable this behavior. While changing the 
> code the hardcoded 2 seconds can also be made configurable. (There's already 
> a TODO about that in the code.)
>  
> A (preliminary) PR making this change is available: 
> https://github.com/apache/beam/pull/7382.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to