Sam Whittle created BEAM-9439:
---------------------------------
Summary: KinesisReader does not report correct backlog statistics
Key: BEAM-9439
URL: https://issues.apache.org/jira/browse/BEAM-9439
Project: Beam
Issue Type: Bug
Components: io-java-kinesis
Reporter: Sam Whittle
The KinesisReader implementing KinesisIO reports backlog by implementing the
UnboundedSource.getTotalBacklogBytes()
method as opposed to the
UnboundedSource.getSplitBacklogBytes()
This value is supposed to represent the total backlog across all shards. This
function is implemented by calling SimplifiedKinesisClient.getBacklogBytes with
the watermark of the kinesis shards managed within the UnboundedReader
instance. As this watermark may be further ahead than the watermark across all
shards, this may miss backlog bytes.
An additional concern is that the watermark is calculated using a
WatermarkPolicy, which means that the watermark may be inconsistent to the
kinesis timestamp for querying backlog.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)