[ 
https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16106619#comment-16106619
 ] 

Arpan edited comment on KAFKA-5153 at 7/30/17 7:30 PM:
-------------------------------------------------------

[~dennisxie1992] [~apurva] Yes we are still facing the same issue after 
changing those parameters as well. We tried increasing the number of partitions 
assuming each partition is handling more traffic causing this issue.

It runs fine for 8-10 days after clean restart and then we come across the same 
issue. Not quite sure if this is present only in 0.10/0.11 version [ we 
upgraded to 0.11 as well thinking this could have resolved in that but no 
chance ]




was (Author: arpan.khagram0...@gmail.com):
[~dennisxie1992] Yes we are still facing the same issue after changing those 
parameters as well. We tried increasing the number of partitions assuming each 
partition is handling more traffic causing this issue.

It runs fine for 8-10 days after clean restart and then we come across the same 
issue. Not quite sure if this is present only in 0.10/0.11 version [ we 
upgraded to 0.11 as well thinking this could have resolved in that but no 
chance ]



> KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
> ---------------------------------------------------------------------------
>
>                 Key: KAFKA-5153
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5153
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.2.0, 0.11.0.0
>         Environment: RHEL 6
> Java Version  1.8.0_91-b14
>            Reporter: Arpan
>            Priority: Critical
>         Attachments: server_1_72server.log, server_2_73_server.log, 
> server_3_74Server.log, server.properties, ThreadDump_1493564142.dump, 
> ThreadDump_1493564177.dump, ThreadDump_1493564249.dump
>
>
> Hi Team, 
> I was earlier referring to issue KAFKA-4477 because the problem i am facing 
> is similar. I tried to search the same reference in release docs as well but 
> did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using 
> 2.11_0.10.2.0.
> I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set 
> of servers in cluster mode. We are having around 240GB of data getting 
> transferred through KAFKA everyday. What we are observing is disconnect of 
> the server from cluster and ISR getting reduced and it starts impacting 
> service.
> I have also observed file descriptor count getting increased a bit, in normal 
> circumstances we have not observed FD count more than 500 but when issue 
> started we were observing it in the range of 650-700 on all 3 servers. 
> Attaching thread dumps of all 3 servers when we started facing the issue 
> recently.
> The issue get vanished once you bounce the nodes and the set up is not 
> working more than 5 days without this issue. Attaching server logs as well.
> Kindly let me know if you need any additional information. Attaching 
> server.properties as well for one of the server (It's similar on all 3 
> serversP)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to