[GitHub] [openwhisk] style95 commented on a diff in pull request #5338: [WIP] Add zero downtime deployment

GitBox Mon, 24 Oct 2022 18:59:00 -0700


style95 commented on code in PR #5338:
URL: https://github.com/apache/openwhisk/pull/5338#discussion_r1003919541



##########
ansible/roles/controller/tasks/deploy.yml:
##########
@@ -363,6 +365,53 @@
   include_tasks: "lean.yml"
   when: lean
 
+# Before redeploy controller, should remove that controller instance from nginx
+- name: remove the controller from nginx's upstream configuration
+  shell:
+    docker exec -t nginx sh -c "sed -i  \"s/ server {{ ansible_host }}:{{ 
controller.basePort + (controller_index | int) }}/ \#server {{ ansible_host 
}}:{{ controller.basePort + (controller_index | int) }}/g\"  
/etc/nginx/nginx.conf  && nginx -s reload"

Review Comment:
   Personally, I don't like this approach, but this is a kind of limitation 
because opensource nginx does not support active health check by default.
   It is related to https://github.com/apache/openwhisk/issues/5337



##########
ansible/roles/controller/tasks/deploy.yml:
##########
@@ -363,6 +365,53 @@
   include_tasks: "lean.yml"
   when: lean
 
+# Before redeploy controller, should remove that controller instance from nginx
+- name: remove the controller from nginx's upstream configuration
+  shell:
+    docker exec -t nginx sh -c "sed -i  \"s/ server {{ ansible_host }}:{{ 
controller.basePort + (controller_index | int) }}/ \#server {{ ansible_host 
}}:{{ controller.basePort + (controller_index | int) }}/g\"  
/etc/nginx/nginx.conf  && nginx -s reload"
+  delegate_to: "{{ item }}"
+  with_items: "{{ groups['edge'] }}"
+  when: zeroDowntimeDeployment.enabled == true
+
+- name: wait some time for controllers fire all existing triggers
+  shell: sleep 5s
+  when: zeroDowntimeDeployment.enabled == true
+
+- name: wait until {{ controller_name }} executes all existing activations
+  uri:
+    url: "{{ controller.protocol }}://{{ ansible_host }}:{{ 
controller.basePort + (controller_index | int) }}/activation/count"

Review Comment:
   The controller would not be deployed until there is no inflight activation.
   



##########
ansible/roles/invoker/tasks/clean.yml:
##########
@@ -22,6 +22,37 @@
     invoker_name: "{{ name_prefix ~ ((invoker_index_base | int) + 
host_group.index(inventory_hostname)) }}"
     invoker_index: "{{ (invoker_index_base | int) + 
host_group.index(inventory_hostname) }}"
 
+- name: disable invoker{{ groups['invokers'].index(inventory_hostname) }}
+  uri:
+    url: "{{ invoker.protocol }}://{{ ansible_host }}:{{ invoker.port + 
groups['invokers'].index(inventory_hostname) }}/disable"
+    validate_certs: no
+    client_key: "{{ invoker.confdir }}/invoker{{ 
groups['invokers'].index(inventory_hostname) }}/{{ invoker.ssl.key }}"
+    client_cert: "{{ invoker.confdir }}/invoker{{ 
groups['invokers'].index(inventory_hostname) }}/{{ invoker.ssl.cert }}"
+    method: POST
+    status_code: 200
+    user: "{{ invoker.username }}"
+    password: "{{ invoker.password }}"
+    force_basic_auth: yes
+  ignore_errors: "{{ invoker.deployment.ignore_error }}"
+  when: zeroDowntimeDeployment.enabled == true and enable_scheduler
+
+- name: wait invoker{{ groups['invokers'].index(inventory_hostname) }} to 
clean up all existing containers
+  uri:
+    url: "{{ invoker.protocol }}://{{ ansible_host }}:{{ invoker.port + 
groups['invokers'].index(inventory_hostname) }}/status/count"

Review Comment:
   The invoker is not deployed until all containers are removed.



##########
ansible/roles/schedulers/tasks/deploy.yml:
##########
@@ -295,27 +290,23 @@
     password: "{{ scheduler.password }}"
     force_basic_auth: yes
   ignore_errors: "{{ scheduler.deployment_ignore_error }}"
-  when: zeroDowntimeDeployment.enabled == true and schedulerDeployed.stdout != 
"0"
+  when: zeroDowntimeDeployment.enabled == true
 
-- name: wait until all queue and create queue task is finished before redeploy 
scheduler when using apicall solution or half solution
+- name: wait until all activation is finished before redeploy scheduler
   uri:
-    url: "{{ scheduler.protocol }}://{{ ansible_host }}:{{ scheduler_port 
}}/queue/total"
+    url: "{{ scheduler.protocol }}://{{ ansible_host }}:{{ scheduler_port 
}}/activation/count"

Review Comment:
   It's the same with other components.
   It will wait until all activations are handled.
   



##########
core/controller/src/main/scala/org/apache/openwhisk/core/controller/Controller.scala:
##########
@@ -175,6 +193,30 @@ class Controller(val instance: ControllerInstanceId,
     LogLimit.config,
     runtimes,
     List(apiV1.basepath()))
+
+  private val controllerUsername = 
loadConfigOrThrow[String](ConfigKeys.whiskControllerUsername)
+  private val controllerPassword = 
loadConfigOrThrow[String](ConfigKeys.whiskControllerPassword)
+
+  /**
+   * disable controller
+   */
+  private def disable(implicit transid: TransactionId) = {

Review Comment:
   According to [this 
step](https://github.com/apache/openwhisk/pull/5338/files#diff-379956107a78f18274d3895bbbb268b964cc5f89ac3601333c4255cc0bb7632dR382),
 the controller does not stop until all activations are handled.
   
   So actually, the disabling step is not necessary for controllers.
   But I just added it for safety.
   



##########
core/controller/src/main/scala/org/apache/openwhisk/core/controller/Controller.scala:
##########
@@ -175,6 +193,30 @@ class Controller(val instance: ControllerInstanceId,
     LogLimit.config,
     runtimes,
     List(apiV1.basepath()))
+
+  private val controllerUsername = 
loadConfigOrThrow[String](ConfigKeys.whiskControllerUsername)
+  private val controllerPassword = 
loadConfigOrThrow[String](ConfigKeys.whiskControllerPassword)
+
+  /**
+   * disable controller
+   */
+  private def disable(implicit transid: TransactionId) = {
+    implicit val executionContext = actorSystem.dispatcher
+    implicit val jsonPrettyResponsePrinter = PrettyPrinter
+    (path("disable") & post) {
+      extractCredentials {
+        case Some(BasicHttpCredentials(username, password)) =>
+          if (username == controllerUsername && password == 
controllerPassword) {
+            loadBalancer.close

Review Comment:
   > I think it would be a good idea to have an isEnabled route similar to the 
other services so that the external monitor that is determining whether the 
service is alive to direct traffic to the node can make the decision to stop 
directing traffic once disabled.
   
   That sounds good but not easy to add it with Nginx.
   Take a look at [this issue](https://github.com/apache/openwhisk/issues/5337).
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [openwhisk] style95 commented on a diff in pull request #5338: [WIP] Add zero downtime deployment

Reply via email to