[ https://issues.apache.org/jira/browse/KAFKA-5797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16143183#comment-16143183 ]
ASF GitHub Bot commented on KAFKA-5797: --------------------------------------- GitHub user guozhangwang opened a pull request: https://github.com/apache/kafka/pull/3748 KAFKA-5797: Delay checking of partition existence in StoreChangelogReader 1. Remove timeout-based validatePartitionExists from StoreChangelogReader; instead only try to refresh metadata once after all tasks have been created and their topology initialized (hence all stores have been registered). 2. Add the logic to refresh partition metadata at the end of initialization if some restorers needing initialization cannot find their changelogs, hoping that in the next run loop these stores can find their changelogs. As a result, we would not ever call `consumer#partitionsFor` any more, but only `consumer#listTopics`; so the only blocking calls left are `listTopics` and `endOffsets, and we always capture timeout exceptions around these two calls, and delay to retry in the next run loop after refreshing the metadata. By doing this we can also reduce the number of request round trips between consumer and brokers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/guozhangwang/kafka K5797-handle-metadata-available Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/3748.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3748 ---- commit a1cbd208007a0a5e73ff917987a457662554c04c Author: Guozhang Wang <wangg...@gmail.com> Date: 2017-08-27T03:47:01Z handlg timeout exception commit 2348799f54722c67f9837133a939e3f982b543d9 Author: Guozhang Wang <wangg...@gmail.com> Date: 2017-08-27T05:25:21Z fix unit tests ---- > StoreChangelogReader should be resilient to broker-side metadata not available > ------------------------------------------------------------------------------ > > Key: KAFKA-5797 > URL: https://issues.apache.org/jira/browse/KAFKA-5797 > Project: Kafka > Issue Type: Bug > Components: streams > Reporter: Guozhang Wang > Assignee: Guozhang Wang > > In {{StoreChangelogReader#validatePartitionExists}}, if the metadata for the > required partition is not available, or a timeout exception is thrown, today > the function would directly throw the exception all the way up to user's > exception handlers. > Since we have now extracted the restoration out of the consumer callback, a > better way to handle this, is to only validate the partition during > restoring, and if it does not exist we can just proceed and retry in the next > loop -- This message was sent by Atlassian JIRA (v6.4.14#64029)