[jira] [Commented] (KAFKA-6593) Coordinator disconnect in heartbeat thread can cause commitSync to block indefinitely

ASF GitHub Bot (JIRA) Tue, 27 Feb 2018 10:14:49 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-6593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16379070#comment-16379070
 ]


ASF GitHub Bot commented on KAFKA-6593:
---------------------------------------

hachikuji opened a new pull request #4625: KAFKA-6593 [WIP]; Fix livelock with 
consumer heartbeat thread in commitSync
URL: https://github.com/apache/kafka/pull/4625
 
 
   Contention for the lock in ConsumerNetworkClient can lead to a livelock 
situation in which an active commitSync is unable to make progress because its 
completion is blocked in the heartbeat thread. The fix is twofold:
   
   1) We change ConsumerNetworkClient to use a fair lock to reduce the chance 
of each thread getting starved.
   2) We eliminate the dependence on the lock in ConsumerNetworkClient for 
callback completion so that callbacks will not be blocked by an active poll().
   
   I've left this as a WIP patch since I am still considering test cases.
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Coordinator disconnect in heartbeat thread can cause commitSync to block 
> indefinitely
> -------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6593
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6593
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 1.0.0, 0.11.0.2
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Major
>             Fix For: 1.1.0
>
>         Attachments: consumer.log
>
>
> If a coordinator disconnect is observed in the heartbeat thread, it can cause 
> a pending offset commit to be cancelled just before the foreground thread 
> begins waiting on its response in poll(). Since the poll timeout is 
> Long.MAX_VALUE, this will cause the consumer to effectively hang until some 
> other network event causes the poll() to return. We try to protect this case 
> with a poll condition on the future, but this isn't bulletproof since the 
> future can be completed outside of the lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-6593) Coordinator disconnect in heartbeat thread can cause commitSync to block indefinitely

Reply via email to