[ https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177723#comment-14177723 ]
Ewen Cheslack-Postava commented on KAFKA-1642: ---------------------------------------------- To summarize the issues fixed now: * Fix logic issue with "expired" in RecordAccumulator.ready * Don't include nodes that can send data when computing the delay until the next check for ready data. Including these doesn't make sense since their delays will change when we send data. * To correctly account for nodes with sendable data, use a timeout of 0 if we send any. This guarantees any necessary delay is computed immediately in the next round after some current data has been removed. * Properly account for nodes with sendable data under connection retry backoff. Since they weren't included in computing the next check delay when looking up ready nodes, we need to account for it later, but only if we conclude the node isn't ready. We need to incorporate the amount of backoff time still required before a retry will be performed (nothing else would wakeup at the right time, unlike other conditions like a full buffer which only change if data is received). It might be possible to break this into smaller commits for each one, but the ordering of applying them needs to be careful because some by themselves result in bad behavior -- the existing client worked because it often ended up with poll timeouts that were much more aggressive (i.e., often 0). > [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network > connection is lost > --------------------------------------------------------------------------------------- > > Key: KAFKA-1642 > URL: https://issues.apache.org/jira/browse/KAFKA-1642 > Project: Kafka > Issue Type: Bug > Components: producer > Affects Versions: 0.8.2 > Reporter: Bhavesh Mistry > Assignee: Ewen Cheslack-Postava > Attachments: KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch > > > I see my CPU spike to 100% when network connection is lost for while. It > seems network IO thread are very busy logging following error message. Is > this expected behavior ? > 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR > org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka > producer I/O thread: > java.lang.IllegalStateException: No entry found for node -2 > at > org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110) > at > org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99) > at > org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394) > at > org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175) > at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115) > at java.lang.Thread.run(Thread.java:744) > Thanks, > Bhavesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)