[
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14224039#comment-14224039
]
Bhavesh Mistry commented on KAFKA-1642:
---------------------------------------
Here are some more cases to reproduce this simulating network connection issue
with one of brokers only and still problem persist:
Case 1: brokers connection is down (note according to ZK leader for partition
still with b1 )
Have tree brokers: b1, b2, b3
1) Start your daemon program and keep sending data to all the brokers and
continue sending some data
2) Observed that you have data netstat -a | grep b1|b2|b3 (keep pumping
data for 5 minutes and observed normal behavior using top -pid or top -p
java_pid )
3) Simulate a network connection or problem establishing new TCP connection via
following as java program still continues to pump data aggressively (please
note TCP connection to B1 still active and connected)
a) sudo vi /etc/hosts 2) add entry "b1 127.0.0.1"
b) /etc/init.d/network restart after while (5 to 7 minutes you will see the
issue but keep pumping data, and also repeat this for b2 it will be more CPU
consumption)
4) Under a heavy dumping data, now producer will try to establish new TCP
connection to B1 and it will get connection refused (Note that CPU spikes up
again and remain in state) just because could not establish.
Case 2) Simulate Firewall rule such as you are only allowed (4 TCP connection
to each brokers)
Do step 1,2 and 3 above.
4) use Iptable rule to reject
To start an "enforcing fire wall":
iptables -A OUTPUT -p tcp -m tcp -d b1 --dport 9092 -j REJECT
5) Still pump data will while iptable rejects ( you will see CPU spike to to
200% more depending on # of producer)
To "recover" :
iptables -D OUTPUT -p tcp -m tcp -d b1 --dport 9092 -j REJECT
> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network
> connection is lost
> ---------------------------------------------------------------------------------------
>
> Key: KAFKA-1642
> URL: https://issues.apache.org/jira/browse/KAFKA-1642
> Project: Kafka
> Issue Type: Bug
> Components: producer
> Affects Versions: 0.8.1.1, 0.8.2
> Reporter: Bhavesh Mistry
> Assignee: Ewen Cheslack-Postava
> Priority: Blocker
> Fix For: 0.8.2
>
> Attachments:
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch,
> KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch,
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while. It
> seems network IO thread are very busy logging following error message. Is
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka
> producer I/O thread:
> java.lang.IllegalStateException: No entry found for node -2
> at
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)