[ https://issues.apache.org/jira/browse/KAFKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15470445#comment-15470445 ]
Ismael Juma commented on KAFKA-4137: ------------------------------------ KAFKA-1935 may be a duplicate of this one (with less detail, so we maybe we can close that one). > Refactor multi-threaded consumer for safer network layer access > --------------------------------------------------------------- > > Key: KAFKA-4137 > URL: https://issues.apache.org/jira/browse/KAFKA-4137 > Project: Kafka > Issue Type: Improvement > Components: consumer > Reporter: Jason Gustafson > > In KIP-62, we added a background thread to send heartbeats while the user is > processing fetched data from a call to poll(). In the implementation, we > elected to share the instance of {{NetworkClient}} between the foreground > thread and this background thread. After working with the system test failure > in KAFKA-3807, we've realized that this probably wasn't a good decision. It > is very tricky to get the synchronization correct with respect to response > callbacks and reasoning about the multi-threaded behavior is very difficult. > For example, a common pattern is to send a request and then call > {{NetworkClient.poll()}} to await its return. With another thread also > potentially calling poll(), the response can actually return before the > sending thread itself invokes poll(). This can cause unnecessary (and > potentially unbounded) blocking, and avoiding it is quite complex. > A different approach we've discussed would be to use two instances of > NetworkClient, one dedicated to fetching, and one dedicated to coordinator > communication. The fetching NetworkClient can continue to work exclusively in > the foreground thread and we can confine the coordinator NetworkClient to the > background thread. This provides much better isolation and avoids all of the > race conditions with calling poll() from two threads. The main complication > is in how to expose blocking APIs to interact with the background thread. For > example, in the current consumer API, rebalance are completed in the > foreground thread, so we would need to coordinate with the background thread > to preserve this (e.g. by using a Future abstraction). -- This message was sent by Atlassian JIRA (v6.3.4#6332)