[
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177723#comment-14177723
]
Ewen Cheslack-Postava commented on KAFKA-1642:
----------------------------------------------
To summarize the issues fixed now:
* Fix logic issue with "expired" in RecordAccumulator.ready
* Don't include nodes that can send data when computing the delay until the
next check for ready data. Including these doesn't make sense since their
delays will change when we send data.
* To correctly account for nodes with sendable data, use a timeout of 0 if we
send any. This guarantees any necessary delay is computed immediately in the
next round after some current data has been removed.
* Properly account for nodes with sendable data under connection retry backoff.
Since they weren't included in computing the next check delay when looking up
ready nodes, we need to account for it later, but only if we conclude the node
isn't ready. We need to incorporate the amount of backoff time still required
before a retry will be performed (nothing else would wakeup at the right time,
unlike other conditions like a full buffer which only change if data is
received).
It might be possible to break this into smaller commits for each one, but the
ordering of applying them needs to be careful because some by themselves result
in bad behavior -- the existing client worked because it often ended up with
poll timeouts that were much more aggressive (i.e., often 0).
> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network
> connection is lost
> ---------------------------------------------------------------------------------------
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
> Issue Type: Bug
> Components: producer
> Affects Versions: 0.8.2
> Reporter: Bhavesh Mistry
> Assignee: Ewen Cheslack-Postava
> Attachments: KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while. It
> seems network IO thread are very busy logging following error message. Is
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka
> producer I/O thread:
> java.lang.IllegalStateException: No entry found for node -2
> at
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)