[ 
https://issues.apache.org/jira/browse/BEAM-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía reassigned BEAM-5063:
----------------------------------

    Assignee: Krzysztof Trubalski  (was: Jean-Baptiste Onofré)

> Watermark does not progress for low traffic streams
> ---------------------------------------------------
>
>                 Key: BEAM-5063
>                 URL: https://issues.apache.org/jira/browse/BEAM-5063
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-kinesis
>    Affects Versions: 2.5.0
>            Reporter: Krzysztof Trubalski
>            Assignee: Krzysztof Trubalski
>            Priority: Major
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> We have a Dataflow Job copying data from multiple Kinesis streams into Big 
> Query. Recently we have noticed that the watermark on one of the streams 
> frequently gets stuck although data from that stream is still being processed 
> (it progress only when the traffic increases or Dataflow autoscaling feature 
> kicks in).
>   
>  Looking at the CloudWatch statistics for the affected stream, it has a 
> really low traffic rate - only ~1 event every few minutes . After 
> investigation and consulting the issue with Google's Dataflow Team, it looks 
> like with such small amount of data on the stream, the function calculating 
> the watermark in KinesisReader reports progress incorrectly.
>   
>  From my initial investigation, I suspect that the issue might be related to 
> usage of MovingFunction in KinesisReader. In the current implementation, it 
> covers 1 minute period of samples, since obtaining the min value flushes 
> stale values, if the traffic is very low the following call to significance 
> check always returns false (as it relies on the number of samples, and most 
> of them were flushed by get() invocation).
>   
>   
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to