[
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224041#comment-14224041
]
Bhavesh Mistry edited comment on KAFKA-1642 at 11/25/14 4:37 AM:
-----------------------------------------------------------------
[~ewencp],
I hope above steps will give you comprehensive steps to reproduce problems with
run() method. It would be really great if we can make the client more
resilient and robust so network and brokers instability does not cause CPU
spikes and degrade application performance. Hence, I would strongly at least
detect the time run(time) is taking and do based on some configuration, we can
do CPU Throttling just to be more defensive or at lest detect that io thread is
taking CPU cycle.
By the way the experimental patch still works for steps describe above as well
due to hard coded back-off.
Any time you have patch or any thing, please let me know I will test it out.
Once thanks for your detail analysis.
Please look into to ClusterConnectionStates and how it manage the state of node
when disconnecting immediately .
please look into connecting(int node, long now) and this (I feel connecting
needs to come before not after).
selector.connect(node.id(), new InetSocketAddress(node.host(), node.port()),
this.socketSendBuffer, this.socketReceiveBuffer);
this.connectionStates.connecting(node.id(), now);
Thanks,
Bhavesh
was (Author: bmis13):
[~ewencp],
I hope above steps will give you comprehensive steps to reproduce problems with
run() method. It would be really great if we can make the client more
resilient and robust so network and brokers instability does not cause CPU
spikes and degrade application performance. Hence, I would strongly at least
detect the time run(time) is taking and do based on some configuration, we can
do CPU Throttling just to be more defensive or at lest detect that io thread is
taking CPU cycle.
By the way the experimental patch still works for steps describe above.
Thanks,
Bhavesh
> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network
> connection is lost
> ---------------------------------------------------------------------------------------
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
> Issue Type: Bug
> Components: producer
> Affects Versions: 0.8.1.1, 0.8.2
> Reporter: Bhavesh Mistry
> Assignee: Ewen Cheslack-Postava
> Priority: Blocker
> Fix For: 0.8.2
>
> Attachments:
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch,
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch,
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while. It
> seems network IO thread are very busy logging following error message. Is
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka
> producer I/O thread:
> java.lang.IllegalStateException: No entry found for node -2
> at
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)