Thomas Crowley created KAFKA-7657:
-------------------------------------
Summary: Invalid reporting of StreamState in KafkaStreams
application
Key: KAFKA-7657
URL: https://issues.apache.org/jira/browse/KAFKA-7657
Project: Kafka
Issue Type: Bug
Components: streams
Affects Versions: 2.0.1
Reporter: Thomas Crowley
We have a streams application with 3 instances running, two of which are
reporting the state of `REBALANCING` even after they have been running for
days. Restarting the application has no effect on the stream state.
This seems suspect because each instance appears to be processing messages, and
the `kafka-consumer-groups` CLI tool reports no offset lag in any of the
partitions assigned to the `REBALANCING` consumers. Each partition seems to be
processing an equal amount of records too.
Inspecting the `state.dir` on disk, it looks like the RocksDB state has been
built and hovers at the expected size on disk.
This problem has persisted for us after we rebuilt our Kafka cluster and reset
topics + consumer groups in our dev environment.
There is nothing in the logs (with level set to `DEBUG`) in both the broker or
the application that suggests something exceptional has happened causing the
application to be stuck `REBALANCING`
We are also running multiple streaming applications where this problem does not
exist.
Two differences between this application and our other streaming applications
are:
* We have `processing.guarantee` set to `exactly_once`
* We are using a `ValueTransformer` which fetches from and puts data on a
windowed state store
The `REBALANCING` state is returned from both polling the `state` method of our
`KafkaStreams` instance, and our custom metric which is derived from some logic
in a `KafkaStreams.StateListener` class attached via the `setStateListener`
method.
While I have provided a bit of context, before I reply with some reproducible
code - is there a simple way in which I can determine that my streams
application is in a `RUNNING` state without relying on the same mechanisms as
used above?
Further, given that it seems like my application is actually running - could
this perhaps be a bug to do with how the stream state is being reported (in the
context of a transactional stream using the processor API)?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)