[ 
https://issues.apache.org/jira/browse/BEAM-6333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16732460#comment-16732460
 ] 

Jan Doms commented on BEAM-6333:
--------------------------------

My use case is an attempt/experiment to implement a deterministic database on 
top of a stream processor.

More info can be found on 
[https://domsj.info/2018/12/30/introducing-streamy-db.html] and 
[https://github.com/domsj/streamy-db|https://github.com/domsj/streamy-db.].

If there is some automatic timeout then you can never be completely sure that 
no record with an older timestamp will be emitted? (What if the kafka broker or 
network is slow for some reason?)

(Or at least, you can not be sure in a deterministic way! That is, in a way 
that can later be repeated by a new instance of the same computation.)

Not sure if you want to cater to this kind of use case... (and actually I 
already worked around it by implementing my own TimestampPolicyFactory, but 
still thought I may as well contribute this)

So to be clear, for my case I'm not just looking at increasing the timeout, I 
actually want to disable the behavior entirely, and have added a heartbeat 
mechanism to my project instead (that pushes dummy messages on each partition 
periodically to keep the watermarks flowing).

> allow disabling/configuring automatic watermark generation for idle kafka 
> partitions
> ------------------------------------------------------------------------------------
>
>                 Key: BEAM-6333
>                 URL: https://issues.apache.org/jira/browse/BEAM-6333
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-kafka
>    Affects Versions: 2.9.0
>            Reporter: Jan Doms
>            Assignee: Raghu Angadi
>            Priority: Major
>              Labels: easyfix, pull-request-available
>   Original Estimate: 0.5h
>          Time Spent: 10m
>  Remaining Estimate: 20m
>
> For pipelines that require the emitted watermarks to be absolutely correct 
> the current behavior using 2 seconds timeout could lead to problems.
> Therefor it should be possible to disable this behavior. While changing the 
> code the hardcoded 2 seconds can also be made configurable. (There's already 
> a TODO about that in the code.)
>  
> A (preliminary) PR making this change is available: 
> https://github.com/apache/beam/pull/7382.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to