[
https://issues.apache.org/jira/browse/KAFKA-8165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
A. Sophie Blee-Goldman resolved KAFKA-8165.
-------------------------------------------
Resolution: Fixed
> Streams task causes Out Of Memory after connection issues and store
> restoration
> -------------------------------------------------------------------------------
>
> Key: KAFKA-8165
> URL: https://issues.apache.org/jira/browse/KAFKA-8165
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 2.1.0
> Environment: 3 nodes, 22 topics, 16 partitions per topic, 1 window
> store, 4 KV stores.
> Kafka Streams application cluster: 3 AWS t2.large instances (8GB mem). 1
> application instance, 2 threads per instance.
> Kafka 2.1, Kafka Streams 2.1
> Amazon Linux.
> Scala application, on Docker based on openJdk9.
> Reporter: Di Campo
> Priority: Major
>
> Having a Kafka Streams 2.1 application, when Kafka brokers are stable, the
> (largely stateful) application has been consuming ~160 messages per second at
> a sustained rate for several hours.
> However it started having connection issues to the brokers.
> {code:java}
> Connection to node 3 (/172.31.36.118:9092) could not be established. Broker
> may not be available. (org.apache.kafka.clients.NetworkClient){code}
> Also it began showing a lot of these errors:
> {code:java}
> WARN [Consumer
> clientId=stream-processor-81e1ce17-1765-49f8-9b44-117f983a2d19-StreamThread-2-consumer,
> groupId=stream-processor] 1 partitions have leader brokers without a
> matching listener, including [broker-2-health-check-0]
> (org.apache.kafka.clients.NetworkClient){code}
> In fact, the _health-check_ topic is in the broker but not consumed by this
> topology or used in any way by the Streams application (it is just broker
> healthcheck). It does not complain about topics that are actually consumed by
> the topology.
> Some time after these errors (that appear at a rate of 24 appearances per
> second during ~5 minutes), then the following logs appear:
> {code:java}
> [2019-03-27 15:14:47,709] WARN [Consumer
> clientId=stream-processor-81e1ce17-1765-49f8-9b44-117f983a2d19-StreamThread-1-restore-consumer,
> groupId=] Connection to node -3 (/ip3:9092) could not be established. Broker
> may not be available. (org.apache.kafka.clients.NetworkClient){code}
> In between 6 and then 3 lines of "Connection could not be established" error
> messages, 3 of these ones slipped in:
> {code:java}
> [2019-03-27 15:14:47,723] WARN Started Restoration of visitorCustomerStore
> partition 15 total records to be restored 17
> (com.divvit.dp.streams.applications.monitors.ConsoleGlobalRestoreListener){code}
>
> ... one for each different KV store I have (I still have another KV that
> does not appear, and a WindowedStore store that also does not appear).
> Then I finally see "Restoration Complete" (using a logging
> ConsoleGlobalRestoreListener as in docs) messages for all of my stores. So it
> seems it may be fine now to restart the processing.
> Three minutes later, some events get processed, and I see an OOM error:
> {code:java}
> java.lang.OutOfMemoryError: GC overhead limit exceeded{code}
>
> ... so given that it usually allows to process during hours under same
> circumstances, I'm wondering whether there is some memory leak in the
> connection resources or somewhere in the handling of this scenario.
> Kafka and KafkaStreams 2.1
--
This message was sent by Atlassian Jira
(v8.3.4#803005)