[
https://issues.apache.org/jira/browse/BEAM-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ismaël Mejía reassigned BEAM-5063:
----------------------------------
Assignee: Krzysztof Trubalski (was: Jean-Baptiste Onofré)
> Watermark does not progress for low traffic streams
> ---------------------------------------------------
>
> Key: BEAM-5063
> URL: https://issues.apache.org/jira/browse/BEAM-5063
> Project: Beam
> Issue Type: Bug
> Components: io-java-kinesis
> Affects Versions: 2.5.0
> Reporter: Krzysztof Trubalski
> Assignee: Krzysztof Trubalski
> Priority: Major
> Time Spent: 2h 50m
> Remaining Estimate: 0h
>
> We have a Dataflow Job copying data from multiple Kinesis streams into Big
> Query. Recently we have noticed that the watermark on one of the streams
> frequently gets stuck although data from that stream is still being processed
> (it progress only when the traffic increases or Dataflow autoscaling feature
> kicks in).
>
> Looking at the CloudWatch statistics for the affected stream, it has a
> really low traffic rate - only ~1 event every few minutes . After
> investigation and consulting the issue with Google's Dataflow Team, it looks
> like with such small amount of data on the stream, the function calculating
> the watermark in KinesisReader reports progress incorrectly.
>
> From my initial investigation, I suspect that the issue might be related to
> usage of MovingFunction in KinesisReader. In the current implementation, it
> covers 1 minute period of samples, since obtaining the min value flushes
> stale values, if the traffic is very low the following call to significance
> check always returns false (as it relies on the number of samples, and most
> of them were flushed by get() invocation).
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)