[ 
https://issues.apache.org/jira/browse/KAFKA-1788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234595#comment-14234595
 ] 

Bhavesh Mistry commented on KAFKA-1788:
---------------------------------------

We also need to fix the Producer Close which hangs JVM because io.join() thread 
does not exit.  Please refer to KAFKA-1642 for more details.  So Kakfa core Dev 
needs to give guidance on how to solve this problem.

Please see below comments from that linked issue.


1) Producer.close() method issue is not address with patch. In event of network 
connection lost or other events happens, IO thread will not be killed and close 
method hangs. In patch that I have provided, I had timeout for join method and 
interrupted IO thread. I think we need similar solution.

[~ewencp],

1. I'm specifically trying to address the CPU usage here. I realize from your 
perspective they are closely related since they're both can be triggered by a 
loss of network connectivity, but internally they're really separate issues – 
the CPU usage has to do with incorrect timeouts and the join() issues is due to 
the lack of timeouts on produce operations. That's why I pointed you toward 
KAFKA-1788. If a timeout is added for data in the producer, that would resolve 
the close issue as well since any data waiting in the producer would eventually 
timeout and the IO thread could exit. I think that's the cleanest solution 
since it solves both problems with a single setting (the amount of time your 
willing to wait before discarding data). If you think a separate timeout 
specifically for Producer.close() is worthwhile I'd suggest filing a separate 
JIRA for that.



> producer record can stay in RecordAccumulator forever if leader is no 
> available
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-1788
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1788
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, producer 
>    Affects Versions: 0.8.2
>            Reporter: Jun Rao
>            Assignee: Jun Rao
>              Labels: newbie++
>             Fix For: 0.8.3
>
>
> In the new producer, when a partition has no leader for a long time (e.g., 
> all replicas are down), the records for that partition will stay in the 
> RecordAccumulator until the leader is available. This may cause the 
> bufferpool to be full and the callback for the produced message to block for 
> a long time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to