[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177723#comment-14177723
 ] 

Ewen Cheslack-Postava commented on KAFKA-1642:
----------------------------------------------

To summarize the issues fixed now:
* Fix logic issue with "expired" in RecordAccumulator.ready
* Don't include nodes that can send data when computing the delay until the 
next check for ready data. Including these doesn't make sense since their 
delays will change when we send data.
* To correctly account for nodes with sendable data, use a timeout of 0 if we 
send any. This guarantees any necessary delay is computed immediately in the 
next round after some current data has been removed.
* Properly account for nodes with sendable data under connection retry backoff. 
Since they weren't included in computing the next check delay when looking up 
ready nodes, we need to account for it later, but only if we conclude the node 
isn't ready. We need to incorporate the amount of backoff time still required 
before a retry will be performed (nothing else would wakeup at the right time, 
unlike other conditions like a full buffer which only change if data is 
received).

It might be possible to break this into smaller commits for each one, but the 
ordering of applying them needs to be careful because some by themselves result 
in bad behavior -- the existing client worked because it often ended up with 
poll timeouts that were much more aggressive (i.e., often 0).

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1642
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1642
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 0.8.2
>            Reporter: Bhavesh Mistry
>            Assignee: Ewen Cheslack-Postava
>         Attachments: KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to