Matthias J. Sax created KAFKA-18344:
---------------------------------------
Summary: Consider to distinguish between multiple "positions"
Key: KAFKA-18344
URL: https://issues.apache.org/jira/browse/KAFKA-18344
Project: Kafka
Issue Type: Improvement
Components: clients, consumer
Reporter: Matthias J. Sax
KafkaConsumer currently maintains a "position" which is the max offset of
records returned via `poll()`.
This "position" is used to compute the consumer "lag metrics". This implies,
that lag is computed slightly different on the consumer, compared to other
tools which use `endOffset - committedOffset`, because "position" does not
reflect the latest _processed_ record, but might be ahead of what the
application code did process. If lag is computed as "endOffset -
committedOffset", lag is always behind, ie, larger than the real lag, what
might actually provide better semantics. – It seems undesired that the consumer
lag metric could be smaller and the actual lag...
We should consider to update the position of the consumer differently:
# A simple changes could be, to update the position to the offset of the
first/oldest record in a `poll()` call (instead of latest/newest as we do right
now), to avoid that the position get ahead and lag is "too small"
# We could also try to hook into the returned `ConsumerRecords` iterator, to
track the position more fine grained on a per-record basis
# We could track multiple positions, like "processed positions" and "fetched
position" (not that "fetched position" might be even further ahead than the
current position, as based on `max.poll.records` not all fetch records might be
returned from `poll()`)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)