[ 
https://issues.apache.org/jira/browse/KAFKA-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raman Gupta updated KAFKA-10229:
--------------------------------
    Summary: Kafka stream dies for no apparent reason, no errors logged on 
client or server  (was: Kafka stream dies when earlier shut down node leaves 
group, no errors logged on client)

> Kafka stream dies for no apparent reason, no errors logged on client or server
> ------------------------------------------------------------------------------
>
>                 Key: KAFKA-10229
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10229
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.4.1
>            Reporter: Raman Gupta
>            Priority: Major
>
> My broker and clients are 2.4.1. I'm currently running a single broker. I 
> have a Kafka stream with exactly once processing turned on. I also have an 
> uncaught exception handler defined on the client. I have a stream which I 
> noticed was lagging. Upon investigation, I see that the consumer group was 
> empty.
> On restarting the consumers, the consumer group re-established itself, but 
> after about 8 minutes, the group became empty again. There is nothing logged 
> on the client side about any stream errors, despite the existence of an 
> uncaught exception handler.
> In the broker logs, I see that about 8 minutes after the clients restart / 
> the stream goes to RUNNING state:
> ```
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 in group 
> produs-cisFileIndexer-stream has failed, removing it from the group 
> (kafka.coordinator.group.GroupCoordinator)
> [2020-07-02 17:34:47,033] INFO [GroupCoordinator 0]: Preparing to rebalance 
> group produs-cisFileIndexer-stream in state PreparingRebalance with old 
> generation 228 (__consumer_offsets-3) (reason: removing member 
> cis-d7fb64c95-kl9wl-1-630af77f-138e-49d1-b76a-6034801ee359 on heartbeat 
> expiration) (kafka.coordinator.group.GroupCoordinator)
> ```
> so according to this the consumer heartbeat has expired. I don't know why 
> this would be, logging shows that the stream was running and processing 
> messages normally and then just stopped processing anything about 4 minutes 
> before it dies, with no apparent errors or issues or anything logged via the 
> uncaught exception handler.
> It doesn't appear to be related to any specific poison pill type messages: 
> restarting the stream causes it to reprocess a bunch more messages from the 
> backlog, and then die again approximately 8 minutes later. At the time of the 
> last message consumed by the stream, there are no `INFO`-level or above logs 
> either in the client or the broker, or any errors whatsoever. The stream 
> consumption simply stops.
> There are two consumers -- even if I limit consumption to only a single 
> consumer, the same thing happens.
> The runtime environment is Kubernetes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to