Nicholas Telford created KAFKA-20711:
----------------------------------------

             Summary: Streams task restore-remaining-records metric invalid 
under EOS
                 Key: KAFKA-20711
                 URL: https://issues.apache.org/jira/browse/KAFKA-20711
             Project: Kafka
          Issue Type: Bug
          Components: streams
    Affects Versions: 4.3.0
            Reporter: Nicholas Telford
            Assignee: Nicholas Telford


The Kafka Streams Task-level metric {{restore-remaining-records}} is intended 
to track the total number of records that still need to be restored.

When the application runs under EOS, this metric is inaccurate, never actually 
dropping to 0 for fully restored tasks, and always showing values substantially 
higher than reality.

The root-cause is that the metric is initialized with a total number of records 
to restored derived as {{logEndOffset - committedOffset}}, using a 
READ_UNCOMMITTED consumer.

This offset range naturally includes uncommitted records and transaction 
markers, in addition to the actual records to restore.

When decrementing the metric during restore, we decrement by the actual number 
of (committed) records that were restored. Since this excludes uncommitted 
records and transaction markers, we will never decrement the metric by the 
total it was initialized with.

I have a fix that I will raise a PR for.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to