[ https://issues.apache.org/jira/browse/APEXMALHAR-2518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16081461#comment-16081461 ]
ASF GitHub Bot commented on APEXMALHAR-2518: -------------------------------------------- GitHub user PramodSSImmaneni opened a pull request: https://github.com/apache/apex-malhar/pull/644 APEXMALHAR-2518 Terminating operator when there is a server error in processing commit offsets You can merge this pull request into a Git repository by running: $ git pull https://github.com/PramodSSImmaneni/apex-malhar APEXMALHAR-2518 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/apex-malhar/pull/644.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #644 ---- commit df4798e3a7e048e25cc7790951d7462dab257cd3 Author: Pramod Immaneni <pra...@datatorrent.com> Date: 2017-07-11T01:09:38Z APEXMALHAR-2518 Terminating operator execution when there is an error in commit offset processing ---- > Kafka input operator stops reading tuples when there is a UNKNOWN_MEMBER_ID > error during committed offset processing > -------------------------------------------------------------------------------------------------------------------- > > Key: APEXMALHAR-2518 > URL: https://issues.apache.org/jira/browse/APEXMALHAR-2518 > Project: Apache Apex Malhar > Issue Type: Bug > Reporter: Pramod Immaneni > Assignee: Pramod Immaneni > > Kafka 0.9 operator stores offsets that are completely processed and no longer > needed (committed offsets) back in kafka. It does so by making a kafka API > call. If the response from kafka server to this call comes back with an > UNKNOWN_MEMBER_ID error, it results in the kafka consumer state changing to > needing partition re-assignment and no further messages are returned by the > consumer. There are a couple of other errors that result in the same state > including when rebalance is in progress. > What exactly caused this error is not known but the following is the likely > reason due to the conditions surrounding the application. When the operator > has temporarily stalled due to back-pressure exerted by the slow downstream, > it will eventually stall the operator kafka consumer thread that is reading > messages from kafka. This will result in the thread not making any kafka > consumer API calls and it will result in no heartbeats being sent to kafka > server. This can cause the server to evict the consumer after a timeout > period. This could have been the cause for the UNKNOWN_MEMBER_ID error. -- This message was sent by Atlassian JIRA (v6.4.14#64029)