[ 
https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019542#comment-16019542
 ] 

dhiraj prajapati commented on KAFKA-5153:
-----------------------------------------

Hi all,
We have a 3-node cluster on our production environment. We recently upgraded 
kafka from 0.9.0.1 to 0.10.1.0 and we are seeing a similar issue of 
intermittent disconnection. We never had this issue in 0.9.0.1.

Below is the exception stack trace:

[2017-05-15 09:33:55,398] WARN [ReplicaFetcherThread-0-2], Error in fetch 
kafka.server.ReplicaFetcherThread$FetchRequest@7213d6d (kafka.server.
ReplicaFetcherThread)
java.io.IOException: Connection to 2 was disconnected before the response was 
read
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.sca
la:115)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1$$anonfun$apply$1.apply(NetworkClientBlockingOps.sca
la:112)
        at scala.Option.foreach(Option.scala:257)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:112)
        at 
kafka.utils.NetworkClientBlockingOps$$anonfun$blockingSendAndReceive$extension$1.apply(NetworkClientBlockingOps.scala:108)
        at 
kafka.utils.NetworkClientBlockingOps$.recursivePoll$1(NetworkClientBlockingOps.scala:137)
        at 
kafka.utils.NetworkClientBlockingOps$.kafka$utils$NetworkClientBlockingOps$$pollContinuously$extension(NetworkClientBlockingOps.scal
a:143)
        at 
kafka.utils.NetworkClientBlockingOps$.blockingSendAndReceive$extension(NetworkClientBlockingOps.scala:108)
        at 
kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:253)
        at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
        at 
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
        at 
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
        at 
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)

Is there a fix for this issue in any of the kafka 10 versions?

> KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-5153
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5153
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.2.0
>         Environment: RHEL 6
> Java Version  1.8.0_91-b14
>            Reporter: Arpan
>            Priority: Critical
>         Attachments: server_1_72server.log, server_2_73_server.log, 
> server_3_74Server.log, server.properties, ThreadDump_1493564142.dump, 
> ThreadDump_1493564177.dump, ThreadDump_1493564249.dump
>
>
> Hi Team, 
> I was earlier referring to issue KAFKA-4477 because the problem i am facing 
> is similar. I tried to search the same reference in release docs as well but 
> did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using 
> 2.11_0.10.2.0.
> I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set 
> of servers in cluster mode. We are having around 240GB of data getting 
> transferred through KAFKA everyday. What we are observing is disconnect of 
> the server from cluster and ISR getting reduced and it starts impacting 
> service.
> I have also observed file descriptor count getting increased a bit, in normal 
> circumstances we have not observed FD count more than 500 but when issue 
> started we were observing it in the range of 650-700 on all 3 servers. 
> Attaching thread dumps of all 3 servers when we started facing the issue 
> recently.
> The issue get vanished once you bounce the nodes and the set up is not 
> working more than 5 days without this issue. Attaching server logs as well.
> Kindly let me know if you need any additional information. Attaching 
> server.properties as well for one of the server (It's similar on all 3 
> serversP)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to