[ 
https://issues.apache.org/jira/browse/KAFKA-4739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860011#comment-15860011
 ] 

Vipul Singh edited comment on KAFKA-4739 at 2/9/17 7:21 PM:
------------------------------------------------------------

Hey [~hachikuji]. 

We tried to reproduce the issue again.

Please note: 
1. In our broker config we use *request.timeout.ms* of *300001*, and 
*group.max.session.timeout.ms* of *300000*
2. In out client config, the only thing we have different from the default is 
in this gist: https://gist.github.com/neoeahit/c1d4027b975b95267e3cbe506899aef8
3. We tried to grep for our consumer group during the time the disconnection 
was happening(https://gist.github.com/neoeahit/622515ba391ddf8566bf09af880a6ae0 
are the broker logs). If you see between 17:55:50,614 and 17:55:51,027, we 
weren't able to find any requests.
4. The logs at the client side, around this time are here. 
https://gist.github.com/neoeahit/3a0a5027bc3499b85cb888918faac2a3 ( Please note 
we have two brokers, and their ip's are 1.1.1.1 and 1.1.1.6[actual ips have 
been changed for the purpose of making logs publicly available])

[~huxi_2b] I am puzzled by that 40 seconds myself. We dont set it in the config 
anywhere, yet we are seeing this in the logs.
One other thing which is a bit puzzling is that the max_wait_time=500 in client 
requests, dosent seem to be honored. Maybe because the connection is already 
disconnected?

Please help us with any pointers, or any troubleshooting steps which we can use 
to help figure this issue. This is causing us a lot of pain, with consumers 
randomly being blocked. 





was (Author: neoeahit):
Hey [~hachikuji]. 

We tried to reproduce the issue again.

Please note: 
1. In our broker config we use *request.timeout.ms* of *300001*, and 
*group.max.session.timeout.ms* of *300000*
2. In out client config, the only thing we have different from the default is 
in this gist: https://gist.github.com/neoeahit/c1d4027b975b95267e3cbe506899aef8
3. We tried to grep for our consumer group during the time the disconnection 
was 
happening(https://gist.github.com/neoeahit/622515ba391ddf8566bf09af880a6ae0). 
If you see between 17:55:50,614 and 17:55:51,027, we weren't able to find any 
requests.
4. The logs at the client side, around this time are here. 
https://gist.github.com/neoeahit/3a0a5027bc3499b85cb888918faac2a3 ( Please note 
we have two brokers, and their ip's are 1.1.1.1 and 1.1.1.6[actual ips have 
been changed for the purpose of making logs publicly available])

[~huxi_2b] I am puzzled by that 40 seconds myself. We dont set it in the config 
anywhere, yet we are seeing this in the logs.
One other thing which is a bit puzzling is that the max_wait_time=500 in client 
requests, dosent seem to be honored. Maybe because the connection is already 
disconnected?

Please help us with any pointers, or any troubleshooting steps which we can use 
to help figure this issue. This is causing us a lot of pain, with consumers 
randomly being blocked. 




> KafkaConsumer poll going into an infinite loop
> ----------------------------------------------
>
>                 Key: KAFKA-4739
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4739
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.9.0.1
>            Reporter: Vipul Singh
>
> We are seeing an issue with our kafka consumer where it seems to go into an 
> infinite loop while polling, trying to fetch data from kafka. We are seeing 
> the heartbeat requests on the broker from the consumer, but nothing else from 
> the kafka consumer.
> We enabled debug level logging on the consumer, and see these logs: 
> https://gist.github.com/neoeahit/757bff7acdea62656f065f4dcb8974b4
> And this just goes on. The way we have been able to replicate this issue, is 
> by restarting the process in multiple successions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to