axiqia commented on issue #4303: Unable to connect to controller0 while running 
openwhisk.yml (local deployment), Connection refused!
URL: 
https://github.com/apache/incubator-openwhisk/issues/4303#issuecomment-498677541
 
 
   Dear all,
   I met the issue again! 
   
   When I ran openwhisk.yml, it failed by `Connection refused`.
   ```
   AILED - RETRYING: wait until the Controller in this host is up and running 
(4 retries left).
   FAILED - RETRYING: wait until the Controller in this host is up and running 
(3 retries left).
   FAILED - RETRYING: wait until the Controller in this host is up and running 
(2 retries left).
   FAILED - RETRYING: wait until the Controller in this host is up and running 
(1 retries left).
   fatal: [controller0]: FAILED! => {"attempts": 12, "changed": false, 
"content": "", "msg": "Status code was -1 and not [200]: Request failed: 
<urlopen error [Errno 111] Connection refused>", "redirected": false, "status": 
-1, "url": "https://172.17.0.1:10001/ping"}
   
   Status code was -1 and not [200]: Request failed: <urlopen error [Errno
   111] Connection refused>
   
   PLAY RECAP 
****************************************************************************************************************************************************************************
   controller0                : ok=25   changed=7    unreachable=0    failed=1
   kafka0                     : ok=11   changed=5    unreachable=0    failed=0
   
   ```
   
   
   So I checked the log file immediately, and there is no error.
   ```
   .........
   [2019-06-04T13:16:21.424Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for limits.actions.sequence.maxLength
   [2019-06-04T13:16:21.424Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for limits.actions.invokes.concurrent
   [2019-06-04T13:16:21.425Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for controller.instances
   [2019-06-04T13:16:21.425Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for limits.actions.invokes.perMinute
   [2019-06-04T13:16:21.425Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for runtimes.manifest
   [2019-06-04T13:16:21.426Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for kafka.hosts
   [2019-06-04T13:16:21.426Z] [INFO] [#tid_sid_unknown] [Config] environment 
set value for port
   [2019-06-04T13:16:28.344Z] [INFO] [#tid_sid_unknown] 
[KafkaMessagingProvider] created topic completed0
   [2019-06-04T13:16:56.275Z] [INFO] [#tid_sid_unknown] 
[KafkaMessagingProvider] created topic health
   [2019-06-04T13:17:47.819Z] [INFO] [#tid_sid_unknown] 
[KafkaMessagingProvider] created topic cacheInvalidation
   
   ```
   
   
   But five minutes later, I checked the log file again, and it contained more 
information as below.
   
   ```
   [2019-06-04T13:19:47.953Z] [WARN] [#tid_sid_unknown] 
[KafkaMessagingProvider] topic events could not be created because of 
org.apache.kafka.common.errors.TimeoutException: The request timed out., 
retries left: 5
   [2019-06-04T13:20:31.962Z] [INFO] [#tid_sid_unknown] 
[KafkaMessagingProvider] topic events already existed
   [2019-06-04T13:20:32.200Z] [INFO] [#tid_sid_controller] [Controller] 
starting controller instance 0 [marker:controller_startup0_counter:250778]
   [2019-06-04T13:20:32.201Z] [INFO] Starting the Kamon(SystemMetrics) module
   [2019-06-04T13:20:32.968Z] [INFO] [#tid_sid_dispatcher] [MessageFeed] 
handler capacity = 128, pipeline fill at = 128, pipeline depth = 256
   [2019-06-04T13:20:33.113Z] [INFO] [#tid_sid_loadbalancer] 
[ShardingContainerPoolBalancerState] managedFraction = 0.9, blackboxFraction = 
0.1
   [2019-06-04T13:20:33.141Z] [INFO] [#tid_sid_dispatcher] [MessageFeed] 
handler capacity = 128, pipeline fill at = 128, pipeline depth = 256
   [2019-06-04T13:20:33.407Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] 
[GET] serving from datastore: CacheKey(whisk.system/invokerHealthTestAction0) 
[marker:database_cacheMiss_counter:251985]
   [2019-06-04T13:20:33.443Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] 
[GET] 'whisk_local_whisks' finding document: 'id: 
whisk.system/invokerHealthTestAction0' 
[marker:database_getDocument_start:252022]
   [2019-06-04T13:20:33.760Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] 
 [marker:database_getDocument_finish:252339:315]
   [2019-06-04T13:20:33.763Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] 
invalidating CacheKey(whisk.system/invokerHealthTestAction0)
   [2019-06-04T13:20:33.771Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] 
write initiated on new cache entry
   [2019-06-04T13:20:33.776Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] 
[PUT] 'whisk_local_whisks' saving document: 'id: 
whisk.system/invokerHealthTestAction0, rev: null' 
[marker:database_saveDocument_start:252355]
   [2019-06-04T13:20:33.798Z] [INFO] [#tid_sid_loadbalancer] [InvokerPool] test 
action for invoker health now exists
   [2019-06-04T13:20:33.913Z] [INFO] [#tid_sid_controller] [Controller] 
loadbalancer initialized: ShardingContainerPoolBalancer
   [2019-06-04T13:20:33.915Z] [INFO] [#tid_sid_dispatcher] [MessageFeed] 
handler capacity = 128, pipeline fill at = 128, pipeline depth = 256
   [2019-06-04T13:20:33.919Z] [INFO] [#tid_sid_loadbalancer] [CouchDbRestStore] 
 [marker:database_saveDocument_finish:252496:141]
   [2019-06-04T13:20:33.927Z] [INFO] [#tid_sid_loadbalancer] [WhiskAction] 
write all done, caching CacheKey(whisk.system/invokerHealthTestAction0) Cached
   [2019-06-04T13:20:33.995Z] [INFO] [#tid_sid_controller] [KindRestrictor] all 
kinds are allowed, the white-list is not specified
   [2019-06-04T13:20:34.394Z] [INFO] [#tid_sid_controller] [ActionsApi] 
actionSequenceLimit '99'
   [2019-06-04T13:23:24.224Z] [ERROR] [Consumer clientId=consumer-health, 
groupId=health0] Offset commit failed on partition health-0 at offset 0: The 
coordinator is not aware of this member.
   [2019-06-04T13:23:24.232Z] [ERROR] [#tid_sid_dispatcher] [MessageFeed] 
failed to commit ping consumer offset: 
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be 
completed since the group has already rebalanced and assigned the partitions to 
another member. This means that the time between subsequent calls to poll() was 
longer than the configured max.poll.interval.ms, which typically implies that 
the poll loop is spending too much time message processing. You can address 
this either by increasing the session timeout or by reducing the maximum size 
of batches returned in poll() with max.poll.records.
   ```
   
   Which means that the issue arises between 13:17:47 and 13:19:47. Is it 
caused by kafka creating topic events? And how can I handle it?
   
   Any response will be appreciated!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to