axiqia edited a comment on issue #4303: Unable to connect to controller0 while running openwhisk.yml (local deployment), Connection refused! URL: https://github.com/apache/incubator-openwhisk/issues/4303#issuecomment-498677541 Dear all, I met the issue again! When I ran openwhisk.yml, it failed by `Connection refused`. ``` AILED - RETRYING: wait until the Controller in this host is up and running (4 retries left). FAILED - RETRYING: wait until the Controller in this host is up and running (3 retries left). FAILED - RETRYING: wait until the Controller in this host is up and running (2 retries left). FAILED - RETRYING: wait until the Controller in this host is up and running (1 retries left). fatal: [controller0]: FAILED! => {"attempts": 12, "changed": false, "content": "", "msg": "Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused>", "redirected": false, "status": -1, "url": "https://172.17.0.1:10001/ping"} Status code was -1 and not [200]: Request failed: <urlopen error [Errno 111] Connection refused> PLAY RECAP **************************************************************************************************************************************************************************** controller0 : ok=25 changed=7 unreachable=0 failed=1 kafka0 : ok=11 changed=5 unreachable=0 failed=0 ``` So I checked the log file immediately, and there is no error. As you can see, the log timestamp stoped at 13:17:47. ``` ......... [2019-06-04T13:16:21.424Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.sequence.maxLength [2019-06-04T13:16:21.424Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.invokes.concurrent [2019-06-04T13:16:21.425Z] [INFO] [#tid_sid_unknown] [Config] environment set value for controller.instances [2019-06-04T13:16:21.425Z] [INFO] [#tid_sid_unknown] [Config] environment set value for limits.actions.invokes.perMinute [2019-06-04T13:16:21.425Z] [INFO] [#tid_sid_unknown] [Config] environment set value for runtimes.manifest [2019-06-04T13:16:21.426Z] [INFO] [#tid_sid_unknown] [Config] environment set value for kafka.hosts [2019-06-04T13:16:21.426Z] [INFO] [#tid_sid_unknown] [Config] environment set value for port [2019-06-04T13:16:28.344Z] [INFO] [#tid_sid_unknown] [KafkaMessagingProvider] created topic completed0 [2019-06-04T13:16:56.275Z] [INFO] [#tid_sid_unknown] [KafkaMessagingProvider] created topic health [2019-06-04T13:17:47.819Z] [INFO] [#tid_sid_unknown] [KafkaMessagingProvider] created topic cacheInvalidation ``` But five minutes later, I checked the log file again, and it contained more information as below. ``` [2019-06-04T13:19:47.953Z] [WARN] [#tid_sid_unknown] [KafkaMessagingProvider] topic events could not be created because of org.apache.kafka.common.errors.TimeoutException: The request timed out., retries left: 5 [2019-06-04T13:20:31.962Z] [INFO] [#tid_sid_unknown] [KafkaMessagingProvider] topic events already existed [2019-06-04T13:20:32.200Z] [INFO] [#tid_sid_controller] [Controller] starting controller instance 0 [marker:controller_startup0_counter:250778] [2019-06-04T13:20:32.201Z] [INFO] Starting the Kamon(SystemMetrics) module [2019-06-04T13:20:32.968Z] [INFO] [#tid_sid_dispatcher] [MessageFeed] handler capacity = 128, pipeline fill at = 128, pipeline depth = 256 [2019-06-04T13:20:33.113Z] [INFO] [#tid_sid_loadbalancer] [ShardingContainerPoolBalancerState] managedFraction = 0.9, blackboxFraction = 0.1 [2019-06-04T13:20:33.141Z] [INFO] [#tid_sid_dispatcher] [MessageFeed] handler capacity = 128, pipeline fill at = 128, pipeline depth = 256 [2019-06-04T13:20:33.407Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] [GET] serving from datastore: CacheKey(whisk.system/invokerHealthTestAction0) [marker:database_cacheMiss_counter:251985] [2019-06-04T13:20:33.443Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] [GET] 'whisk_local_whisks' finding document: 'id: whisk.system/invokerHealthTestAction0' [marker:database_getDocument_start:252022] [2019-06-04T13:20:33.760Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] [marker:database_getDocument_finish:252339:315] [2019-06-04T13:20:33.763Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] invalidating CacheKey(whisk.system/invokerHealthTestAction0) [2019-06-04T13:20:33.771Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] write initiated on new cache entry [2019-06-04T13:20:33.776Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] [PUT] 'whisk_local_whisks' saving document: 'id: whisk.system/invokerHealthTestAction0, rev: null' [marker:database_saveDocument_start:252355] [2019-06-04T13:20:33.798Z] [INFO] [#tid_sid_loadbalancer] [InvokerPool] test action for invoker health now exists [2019-06-04T13:20:33.913Z] [INFO] [#tid_sid_controller] [Controller] loadbalancer initialized: ShardingContainerPoolBalancer [2019-06-04T13:20:33.915Z] [INFO] [#tid_sid_dispatcher] [MessageFeed] handler capacity = 128, pipeline fill at = 128, pipeline depth = 256 [2019-06-04T13:20:33.919Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] [marker:database_saveDocument_finish:252496:141] [2019-06-04T13:20:33.927Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] write all done, caching CacheKey(whisk.system/invokerHealthTestAction0) Cached [2019-06-04T13:20:33.995Z] [INFO] [#tid_sid_controller] [KindRestrictor] all kinds are allowed, the white-list is not specified [2019-06-04T13:20:34.394Z] [INFO] [#tid_sid_controller] [ActionsApi] actionSequenceLimit '99' [2019-06-04T13:23:24.224Z] [ERROR] [Consumer clientId=consumer-health, groupId=health0] Offset commit failed on partition health-0 at offset 0: The coordinator is not aware of this member. [2019-06-04T13:23:24.232Z] [ERROR] [#tid_sid_dispatcher] [MessageFeed] failed to commit ping consumer offset: org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. ``` Which means that the issue arises between 13:17:47 and 13:19:47. Is it caused by kafka creating topic events? And how can I handle it? Any response will be appreciated!
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
