[ 
https://issues.apache.org/jira/browse/BEAM-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732788#comment-16732788
 ] 

Jan Doms commented on BEAM-6333:
--------------------------------

Seems like my understanding was a bit limited (or just plain incorrect). Thanks 
for the pointer.

Would it be correct to say that the mechanism doesn't account for potential 
clockdrift between the broker and the beam node?

Or scenarios where the broker restarts and for whatever reason goes back in 
time a little, and then emits a record with a higher timestamp than the 
previous record, but lower than what beam currently thinks the watermark should 
be?

(I'm thinking of jepsen test like scenarios here ... but maybe I'm getting too 
paranoid)

> Which determinism are you specifically referring to? Could you give an 
> example?

The computation I'm doing (transaction processing) should be a 'pure' function, 
in the sense that it should always give the same results (given the same 
inputs). It should in theory be possible to perform the computation in parallel 
in 2 separate datacenters and let them generate the same output. Or to 
(partially) redo the computation later to re-determine a historical state.

Any incorrect watermark would be a problem. I am most sure that the behavior 
would be correct if this mechanism is disabled and I generate heartbeat 
messages on each partition instead.

> Btw, your implementation will throw exception since watermark can be queried 
> even before any records are read, better to return 'min timestamp' in that 
> case.

Thanks for the tip. It's actually still untested, and I guess you noticed ;)

> allow disabling/configuring automatic watermark generation for idle kafka 
> partitions
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-6333
>                 URL: https://issues.apache.org/jira/browse/BEAM-6333
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-kafka
>    Affects Versions: 2.9.0
>            Reporter: Jan Doms
>            Assignee: Raghu Angadi
>            Priority: Major
>              Labels: easyfix, pull-request-available
>   Original Estimate: 0.5h
>          Time Spent: 10m
>  Remaining Estimate: 20m
>
> For pipelines that require the emitted watermarks to be absolutely correct 
> the current behavior using 2 seconds timeout could lead to problems.
> Therefor it should be possible to disable this behavior. While changing the 
> code the hardcoded 2 seconds can also be made configurable. (There's already 
> a TODO about that in the code.)
>  
> A (preliminary) PR making this change is available: 
> https://github.com/apache/beam/pull/7382.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to