style95 commented on PR #5332:
URL: https://github.com/apache/openwhisk/pull/5332#issuecomment-1275478437

   ### 1. Disable the scheduler
   ```
   [2022-10-11T09:31:48.791Z] [WARN] [#tid_BOjlyL4vneiwdeJc422InK80fwojMF0e] 
[FPCSchedulerServer] Scheduler is disabled
   ```
   
   ### 2. Shutting down the queue manager
   ```
   [2022-10-11T09:31:48.802Z] [INFO] [#tid_sid_unknown] [QueueManager] 
Gracefully shutdown the queue manager
   ```
   
   ### 3. Shutting down the queue
   ```
   [2022-10-11T09:31:48.805Z] [INFO] [#tid_sid_unknown] [MemoryQueue] 
[guest:guest/[email protected]:Running] Gracefully shutdown the memory queue.
   ```
   
   ### 4. Queue endpoints are removed
   
   ```
   [2022-10-11T09:31:48.805Z] [INFO] [#tid_sid_unknown] [WatcherService] 
unwatch endpoint: 
UnwatchEndpoint(bmt-1480/queue/guest/guest/noop0/leader,false,data-management-service,true)
   [2022-10-11T09:31:48.805Z] [INFO] [#tid_sid_unknown] [WatcherService] 
unwatch endpoint: 
UnwatchEndpoint(bmt-1480/throttling/guest,false,data-management-service,true)
   [2022-10-11T09:31:48.805Z] [INFO] [#tid_sid_unknown] [WatcherService] 
unwatch endpoint: 
UnwatchEndpoint(bmt-1480/throttling/guest/guest/noop0,false,data-management-service,true)
   [2022-10-11T09:31:48.805Z] [INFO] [#tid_sid_unknown] [QueueManager] Remove 
init revision map cause queue is removed, key: 
bmt-1480/queue/guest/guest/noop0/leader
   ```
   
   ### 5. A queue is in the removing state
   
   It's waiting for buffered activations to be handled.
   
   ```
   [2022-10-11T09:31:48.808Z] [INFO] [#tid_sid_unknown] [MemoryQueue] 
[guest:guest/[email protected]:Removing] Queue is going to stop but there are still 0 
activations and 17 request buffered.
   [2022-10-11T09:31:48.811Z] [INFO] [#tid_sid_unknown] [MemoryQueue] 
[guest:guest/[email protected]:Removing] This queue is shutdown by `/disable` api, do 
nothing here.
   ```
   
   ### 5. New activation comes and it recovers the queue
   
   In the meantime, new activation comes and it tries to recover a queue.
   
   ```
   [2022-10-11T09:31:48.807Z] [INFO] [#tid_3168d300edca4fda4e2927be9dd35bf8] 
[QueueManager] Got activation message b6cec1514fa545828ec1514fa5558251 for 
Namespace(guest,23bc46b1-71f6-4ed5-8c54-816aa4f8c502)/guest/[email protected] from 
kafka.
   [2022-10-11T09:31:48.807Z] [INFO] [#tid_3168d300edca4fda4e2927be9dd35bf8] 
[QueueManager] [b6cec1514fa545828ec1514fa5558251] the key 
bmt-1480/queue/guest/guest/noop0/leader is not in the initRevisionMap. 
revision: 15-5aa72ce8d0070b26994803ac8c22de81
   [2022-10-11T09:31:48.807Z] [INFO] [#tid_85ddc3a977a53a14f6a709d503534b92] 
[QueueManager] Recover a queue for guest/[email protected],
   ```
   
   
   ### 6. All incoming activations trigger the recovery procedure
   
   ```
   [2022-10-11T09:31:48.822Z] [INFO] [#tid_c4bc33dcb78ef6f5794af9e424d0c7c8] 
[QueueManager] Got activation message 77d486061de24c869486061de28c8606 for 
Namespace(guest,23bc46b1-71f6-4ed5-8c54-816aa4f8c502)/guest/[email protected] from 
remote queue manager.
   [2022-10-11T09:31:48.822Z] [INFO] [#tid_c4bc33dcb78ef6f5794af9e424d0c7c8] 
[QueueManager] Recover a queue for guest/[email protected],
   [2022-10-11T09:31:48.823Z] [INFO] [#tid_3168d300edca4fda4e2927be9dd35bf8] 
[QueueManager] Got activation message b6cec1514fa545828ec1514fa5558251 for 
Namespace(guest,23bc46b1-71f6-4ed5-8c54-816aa4f8c502)/guest/[email protected] from 
remote queue manager.
   [2022-10-11T09:31:48.823Z] [INFO] [#tid_3168d300edca4fda4e2927be9dd35bf8] 
[QueueManager] Recover a queue for guest/[email protected],
   [2022-10-11T09:31:48.823Z] [INFO] [#tid_3ab104600fa6cc0628d56b4e2c56c52d] 
[QueueManager] Got activation message 4864ed562a78408fa4ed562a78308f75 for 
Namespace(guest,23bc46b1-71f6-4ed5-8c54-816aa4f8c502)/guest/[email protected] from 
remote queue manager.
   [2022-10-11T09:31:48.823Z] [INFO] [#tid_3ab104600fa6cc0628d56b4e2c56c52d] 
[QueueManager] Recover a queue for guest/[email protected],
   [2022-10-11T09:31:48.823Z] [INFO] [#tid_b4f3a44cfe6aac7fd242d03b4aa791c2] 
[QueueManager] Got activation message 0db275dac7b84458b275dac7b8b45842 for 
Namespace(guest,23bc46b1-71f6-4ed5-8c54-816aa4f8c502)/guest/[email protected] from 
remote queue manager.
   [2022-10-11T09:31:48.823Z] [INFO] [#tid_b4f3a44cfe6aac7fd242d03b4aa791c2] 
[QueueManager] Recover a queue for guest/[email protected],
   [2022-10-11T09:31:48.824Z] [INFO] [#tid_c85be9165297bc619860e1a26cc04649] 
[QueueManager] Got activation message 3ca322823217472fa322823217072f15 for 
Namespace(guest,23bc46b1-71f6-4ed5-8c54-816aa4f8c502)/guest/[email protected] from 
remote queue manager.
   [2022-10-11T09:31:48.824Z] [INFO] [#tid_c85be9165297bc619860e1a26cc04649] 
[QueueManager] Recover a queue for guest/[email protected],
   [2022-10-11T09:31:48.824Z] [INFO] [#tid_91cb27040fff0411ca0b4bb985973e74] 
[QueueManager] Got activation message b0e69df6995b4962a69df6995ba96246 for 
Namespace(guest,23bc46b1-71f6-4ed5-8c54-816aa4f8c502)/guest/[email protected] from 
remote queue manager.
   [2022-10-11T09:31:48.824Z] [INFO] [#tid_91cb27040fff0411ca0b4bb985973e74] 
[QueueManager] Recover a queue for guest/[email protected],
   [2022-10-11T09:31:48.824Z] [INFO] [#tid_a9fa02e03083ae8a1328a37047e5c6be] 
[QueueManager] Got activation message 6d5c80bb62eb4a7f9c80bb62eb8a7f92 for 
Namespace(guest,23bc46b1-71f6-4ed5-8c54-816aa4f8c502)/guest/[email protected] from 
remote queue manager.
   [2022-10-11T09:31:48.824Z] [INFO] [#tid_a9fa02e03083ae8a1328a37047e5c6be] 
[QueueManager] Recover a queue for guest/[email protected],
   ```
   
   ### 7. The queue is recreated
   It is also added to the init revision map and other recovery attempts are 
ignored.
   ```
   [2022-10-11T09:31:48.884Z] [WARN] [#tid_3ab104600fa6cc0628d56b4e2c56c52d] 
[QueueManager] recreate queue for guest/[email protected]
   [2022-10-11T09:31:48.884Z] [INFO] [#tid_sid_unknown] [QueueManager] Update 
init revision map, key: bmt-1480/queue/guest/guest/noop0/leader, rev: 
15-5aa72ce8d0070b26994803ac8c22de81
   [2022-10-11T09:31:48.885Z] [INFO] [#tid_sid_unknown] [MemoryQueue] 
[guest:guest/[email protected]:Uninitialized] a new queue is created, 
retentionTimeout: 60000, kind: nodejs:10.
   [2022-10-11T09:31:48.885Z] [INFO] [#tid_a9fa02e03083ae8a1328a37047e5c6be] 
[QueueManager] Queue for action guest/[email protected] is already recovered, skip
   ```
   
   ### 8. Shutdown the queue again
   
   This newly created queue even received an activation but it receives 
`GracefulShutdown` again.
   And the queue endpoint is also removed.
   ```
   [2022-10-11T09:31:48.885Z] [INFO] [#tid_3ab104600fa6cc0628d56b4e2c56c52d] 
[MemoryQueue] [guest:guest/[email protected]:Running] got a new activation message 
4864ed562a78408fa4ed562a78308f75
   [2022-10-11T09:31:48.885Z] [INFO] [#tid_sid_unknown] [MemoryQueue] 
[guest:guest/[email protected]:Running] Gracefully shutdown the memory queue.
   [2022-10-11T09:31:48.885Z] [INFO] [#tid_sid_unknown] [WatcherService] 
unwatch endpoint: 
UnwatchEndpoint(bmt-1480/queue/guest/guest/noop0/leader,false,data-management-service,true)
   ```
   
   This is because the scheduler is in the `shuttingdown` state.
   ```scala
     private def recreateQueue(action: FullyQualifiedEntityName,
                               msg: ActivationMessage,
                               actionMetaData: WhiskActionMetaData): Unit = {
       logging.warn(this, s"recreate queue for ${msg.action}")(msg.transid)
       val queue = createAndStartQueue(msg.user.namespace.name.asString, 
action, msg.revision, actionMetaData)
       queue ! msg
       msg.transid.mark(this, LoggingMarkers.SCHEDULER_QUEUE_RECOVER)
       if (isShuttingDown) {
         queue ! GracefulShutdown.  // here
       }
     }
   ```
   
   ### 9. While the queue is going to stop another queue is created again
   And it also shut down the queue again.
   ```
   [2022-10-11T09:31:48.886Z] [WARN] [#tid_738382b188d8c89f5f517a02c4bdd2c6] 
[QueueManager] recreate queue for guest/[email protected]
   [2022-10-11T09:31:48.888Z] [INFO] [#tid_sid_unknown] [MemoryQueue] 
[guest:guest/[email protected]:Running] Gracefully shutdown the memory queue.
   [2022-10-11T09:31:48.888Z] [INFO] [#tid_sid_unknown] [QueueManager] Remove 
init revision map cause queue is removed, key: 
bmt-1480/queue/guest/guest/noop0/leader
   ```
   
   ### 10. Shutdown the feed
   This problem was repeatedly happening until the feed is also shut down and 
no more activation was coming.
   
   ```
   [2022-10-11T09:31:49.306Z] [INFO] [#tid_sid_unknown] 
[KafkaConsumerConnector] wakeup consumer for 'creationAck0'
   [2022-10-11T09:31:49.306Z] [INFO] [#tid_sid_unknown] 
[KafkaConsumerConnector] WakeupException happened when do commit action for 
topic creationAck0
   [2022-10-11T09:31:49.307Z] [INFO] [#tid_sid_unknown] 
[KafkaConsumerConnector] closing consumer for 'creationAck0'
   ```
   
   ### 11. Stop the scheduler when there are still activations in the queue
   
   Since no more activation was coming, all created queues (for the same 
action) were in the removing state and no queue reference was in the 
`QueuePool`. Since the current deployment script only checks if there is a 
queue in the scheduler, it just redeployed the scheduler while there are still 
activations in the queues.
   
   We can see non-zero `current` values.
   
   ```
   [2022-10-11T09:31:53.109Z] [INFO] [#tid_sid_unknown] 
[SchedulingDecisionMaker] [queue snapshot] initialized: false, 
avgDuration:Some(1.9665046118449014), stateName:Removing, incoming:0, 
current:2, existing:0, in-progress:2, stale:2, existingInNs:94, 
in-progressInNs:81
   [2022-10-11T09:31:53.109Z] [INFO] [#tid_sid_unknown] 
[SchedulingDecisionMaker] [queue snapshot] initialized: true, 
avgDuration:Some(1.9665114532673995), stateName:Removing, incoming:0, 
current:2, existing:0, in-progress:2, stale:2, existingInNs:94, 
in-progressInNs:81
   [2022-10-11T09:31:53.109Z] [INFO] [#tid_sid_unknown] 
[SchedulingDecisionMaker] [queue snapshot] initialized: true, 
avgDuration:Some(1.9665046118449014), stateName:Removing, incoming:0, 
current:3, existing:0, in-progress:3, stale:3, existingInNs:94, 
in-progressInNs:81
   [2022-10-11T09:31:53.109Z] [INFO] [#tid_sid_unknown] 
[SchedulingDecisionMaker] [queue snapshot] initialized: true, 
avgDuration:Some(1.9664630890896282), stateName:Removing, incoming:0, 
current:3, existing:0, in-progress:3, stale:3, existingInNs:94, 
in-progressInNs:81
   [2022-10-11T09:31:53.109Z] [INFO] [#tid_sid_unknown] 
[SchedulingDecisionMaker] [queue snapshot] initialized: true, 
avgDuration:Some(1.966497872446949), stateName:Removing, incoming:0, 
current:18, existing:0, in-progress:18, stale:18, existingInNs:94, 
in-progressInNs:81
   [2022-10-11T09:31:53.181Z] [INFO] Cluster Node 
[akka://[email protected]:25520] - Leader is moving node 
[akka://[email protected]:25520] to [Exiting]
   [2022-10-11T09:31:53.183Z] [INFO] Cluster Node 
[akka://[email protected]:25520] - Exiting completed
   [2022-10-11T09:31:53.186Z] [INFO] Cluster Node 
[akka://[email protected]:25520] - Shutting down...
   [2022-10-11T09:31:53.187Z] [INFO] Cluster Node 
[akka://[email protected]:25520] - Successfully shut down
   [2022-10-11T09:31:53.198Z] [INFO] [#tid_sid_unknown] [Scheduler] Shutting 
down Kamon with coordinated shutdown
   [2022-10-11T09:31:53.201Z] [INFO] Shutting down remote daemon.
   [2022-10-11T09:31:53.205Z] [INFO] Remote daemon shut down; proceeding with 
flushing remote transports.
   ```
   
   When the scheduler was stopped, there were around 10 different queues for 
the same actions and 33 activations were in them.
   
   ```
   [2022-10-11T09:31:51.309Z]initialized: true, 
avgDuration:Some(1.9665047142459378), stateName:Removing, incoming:0, 
current:2, existing:29, in-progress:88, stale:2, existingInNs:70, 
in-progressInNs:88
   [2022-10-11T09:31:51.649Z]initialized: false, 
avgDuration:Some(1.9665046118449014), stateName:Removing, incoming:0, 
current:2, existing:0, in-progress:2, stale:2, existingInNs:72, 
in-progressInNs:88
   [2022-10-11T09:31:51.548Z]initialized: true, 
avgDuration:Some(1.9664630890896282), stateName:Removing, incoming:0, 
current:3, existing:0, in-progress:3, stale:3, existingInNs:71, 
in-progressInNs:88
   [2022-10-11T09:31:51.649Z]initialized: true, 
avgDuration:Some(1.966497872446949), stateName:Removing, incoming:0, 
current:18, existing:0, in-progress:18, stale:18, existingInNs:72, 
in-progressInNs:88
   [2022-10-11T09:31:51.548Z]initialized: true, 
avgDuration:Some(1.9665114532673995), stateName:Removing, incoming:0, 
current:2, existing:0, in-progress:2, stale:2, existingInNs:71, 
in-progressInNs:88
   [2022-10-11T09:31:51.548Z]initialized: true, 
avgDuration:Some(1.9664458988566114), stateName:Removing, incoming:0, 
current:2, existing:0, in-progress:2, stale:2, existingInNs:71, 
in-progressInNs:88
   [2022-10-11T09:31:51.548Z]initialized: true, 
avgDuration:Some(1.9664526436773975), stateName:Removing, incoming:0, 
current:1, existing:0, in-progress:1, stale:1, existingInNs:71, 
in-progressInNs:88
   [2022-10-11T09:31:51.548Z]initialized: false, 
avgDuration:Some(1.9664874298803876), stateName:Removing, incoming:0, 
current:2, existing:0, in-progress:2, stale:2, existingInNs:71, 
in-progressInNs:88
   [2022-10-11T09:31:51.548Z]initialized: true, 
avgDuration:Some(1.9665462308159125), stateName:Removing, incoming:0, 
current:1, existing:0, in-progress:1, stale:1, existingInNs:71, 
in-progressInNs:88
   [2022-10-11T09:31:51.548Z]initialized: true, avgDuration:Some(-48.25), 
stateName:Removing, incoming:0, current:0, existing:47, in-progress:0, stale:0, 
existingInNs:71, in-progressInNs:88
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to