[ 
https://issues.apache.org/jira/browse/KAFKA-1642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232061#comment-14232061
 ] 

Bhavesh Mistry commented on KAFKA-1642:
---------------------------------------

Hi  [~ewencp],

I will not have time to validate this patch till next week.  

Here is my comments:

1) You still have not address the Producer.close() method issue that in event 
of network connection lost or other events happens, IO thread will not be 
killed and close method hangs. In patch that I have provided, I had timeout for 
join method and interrupted IO thread.  I think we need similar for this.
2) Also, can we please add JMX monitoring for IO tread to know how quick it is 
running.  It will great to add this and run() method will report duration to 
metric.
{code}
            try{
                ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
                if(bean.isThreadCpuTimeSupported() && 
bean.isThreadCpuTimeEnabled()){
                        this.ioTheadCPUTime = metrics.sensor("iothread-cpu");
                    this.ioTheadCPUTime.add("iothread-cpu-ms", "The Rate Of CPU 
Cycle used by iothead in NANOSECONDS", new Rate(TimeUnit.NANOSECONDS) {
                        public double measure(MetricConfig config, long now) {
                            return (now - metadata.lastUpdate()) / 1000.0;
                        }
                    });                         
                }
            }catch(Throwable th){
                log.warn("Not able to set the CPU time... etc");
            }
{code}

3)  Please check the timeout final value in *pollTimeout* if it is zero for 
constantly then we need to slow IO thread down.
4)  Defensive check in for back off  in run() method when IO thread is 
aggressive:  

5)  When all nodes are disconnected, do you still want to spin the IO Thread ?

6)  When you have a firewall rule that says "you can only have 2 concurrent TCP 
connections from Client to Brokers" and client still have live TCP connection 
to same not (Broker), but new TCP connection is rejected. Node State will be 
marked as Disconnected in initiateConnect ?  Are you handling that gracefully  ?

By the way, thank you very much for quick reply and with new patch.  I 
appreciate your help.

Thanks,

Bhavesh 

> [Java New Producer Kafka Trunk] CPU Usage Spike to 100% when network 
> connection is lost
> ---------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1642
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1642
>             Project: Kafka
>          Issue Type: Bug
>          Components: producer 
>    Affects Versions: 0.8.2
>            Reporter: Bhavesh Mistry
>            Assignee: Ewen Cheslack-Postava
>            Priority: Blocker
>             Fix For: 0.8.2
>
>         Attachments: 
> 0001-Initial-CPU-Hish-Usage-by-Kafka-FIX-and-Also-fix-CLO.patch, 
> KAFKA-1642.patch, KAFKA-1642.patch, KAFKA-1642_2014-10-20_17:33:57.patch, 
> KAFKA-1642_2014-10-23_16:19:41.patch
>
>
> I see my CPU spike to 100% when network connection is lost for while.  It 
> seems network  IO thread are very busy logging following error message.  Is 
> this expected behavior ?
> 2014-09-17 14:06:16.830 [kafka-producer-network-thread] ERROR 
> org.apache.kafka.clients.producer.internals.Sender - Uncaught error in kafka 
> producer I/O thread: 
> java.lang.IllegalStateException: No entry found for node -2
> at 
> org.apache.kafka.clients.ClusterConnectionStates.nodeState(ClusterConnectionStates.java:110)
> at 
> org.apache.kafka.clients.ClusterConnectionStates.disconnected(ClusterConnectionStates.java:99)
> at 
> org.apache.kafka.clients.NetworkClient.initiateConnect(NetworkClient.java:394)
> at 
> org.apache.kafka.clients.NetworkClient.maybeUpdateMetadata(NetworkClient.java:380)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:174)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:175)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:115)
> at java.lang.Thread.run(Thread.java:744)
> Thanks,
> Bhavesh



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to