[ https://issues.apache.org/jira/browse/KAFKA-6199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robin Tweedie resolved KAFKA-6199. ---------------------------------- Resolution: Fixed Fix Version/s: 1.1.0 We upgraded this single broker to 1.1.0 (keeping the log format on 0.9) and have had 24 hours with no heap growth. I can only assume this leak was fixed somewhere between 0.10.2.1 and 1.1.0. > Single broker with fast growing heap usage > ------------------------------------------ > > Key: KAFKA-6199 > URL: https://issues.apache.org/jira/browse/KAFKA-6199 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.10.2.1 > Environment: Amazon Linux > Reporter: Robin Tweedie > Priority: Major > Fix For: 1.1.0 > > Attachments: Screen Shot 2017-11-10 at 1.55.33 PM.png, Screen Shot > 2017-11-10 at 11.59.06 AM.png, dominator_tree.png, histo_live.txt, > histo_live_20171206.txt, histo_live_80.txt, jstack-2017-12-08.scrubbed.out, > merge_shortest_paths.png, path2gc.png > > > We have a single broker in our cluster of 25 with fast growing heap usage > which necessitates us restarting it every 12 hours. If we don't restart the > broker, it becomes very slow from long GC pauses and eventually has > {{OutOfMemory}} errors. > See {{Screen Shot 2017-11-10 at 11.59.06 AM.png}} for a graph of heap usage > percentage on the broker. A "normal" broker in the same cluster stays below > 50% (averaged) over the same time period. > We have taken heap dumps when the broker's heap usage is getting dangerously > high, and there are a lot of retained {{NetworkSend}} objects referencing > byte buffers. > We also noticed that the single affected broker logs a lot more of this kind > of warning than any other broker: > {noformat} > WARN Attempting to send response via channel for which there is no open > connection, connection id 13 (kafka.network.Processor) > {noformat} > See {{Screen Shot 2017-11-10 at 1.55.33 PM.png}} for counts of that WARN log > message visualized across all the brokers (to show it happens a bit on other > brokers, but not nearly as much as it does on the "bad" broker). > I can't make the heap dumps public, but would appreciate advice on how to pin > down the problem better. We're currently trying to narrow it down to a > particular client, but without much success so far. > Let me know what else I could investigate or share to track down the source > of this leak. -- This message was sent by Atlassian JIRA (v7.6.3#76005)