Sam Whittle created BEAM-9439:
---------------------------------

             Summary: KinesisReader does not report correct backlog statistics 
                 Key: BEAM-9439
                 URL: https://issues.apache.org/jira/browse/BEAM-9439
             Project: Beam
          Issue Type: Bug
          Components: io-java-kinesis
            Reporter: Sam Whittle


The KinesisReader implementing KinesisIO reports backlog by implementing the
UnboundedSource.getTotalBacklogBytes()
method as opposed to the
UnboundedSource.getSplitBacklogBytes()

This value is supposed to represent the total backlog across all shards.  This 
function is implemented by calling SimplifiedKinesisClient.getBacklogBytes with 
the watermark of the kinesis shards managed within the UnboundedReader 
instance.  As this watermark may be further ahead than the watermark across all 
shards, this may miss backlog bytes.

An additional concern is that the watermark is calculated using a 
WatermarkPolicy, which means that the watermark may be inconsistent to the 
kinesis timestamp for querying backlog.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to