This is an automated email from the ASF dual-hosted git repository. astefanutti pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/camel-k.git
commit 1a8c9d3a4018dead556344706435f12c8b51dfa1 Author: Antonin Stefanutti <[email protected]> AuthorDate: Wed Dec 2 17:11:45 2020 +0100 chore(doc): Add SOPs for CamelKSuccessBuildDuration5m and CamelKBuildError SLOs --- .../ROOT/pages/troubleshooting/operating.adoc | 55 ++++++++++++++++++++-- 1 file changed, 52 insertions(+), 3 deletions(-) diff --git a/docs/modules/ROOT/pages/troubleshooting/operating.adoc b/docs/modules/ROOT/pages/troubleshooting/operating.adoc index 80cd352..68ea05f 100644 --- a/docs/modules/ROOT/pages/troubleshooting/operating.adoc +++ b/docs/modules/ROOT/pages/troubleshooting/operating.adoc @@ -67,7 +67,7 @@ $ kubectl logs deployment/camel-k-operator --since=1h \ | "-n \(.namespace) \(.controller | rtrimstr("-controller"))/\(.name)"' \ | xargs kubectl describe ---- -Check the resource events. +Check the resource specification and events. * Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future. @@ -76,7 +76,7 @@ Check the resource events. ==== Description This alert has severity level of "warning". -It's firing when more than 1% of the successful builds have their duration above 2 min. +It's firing when more than 10% of the successful builds have their duration above 2 min. ==== Troubleshooting @@ -93,6 +93,55 @@ $ kubectl get builds.camel.apache.org -o json \ | "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \ | xargs kubectl describe ---- -Check the resource events. +Check the resource specification and events. + +* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future. + +=== CamelKSuccessBuildDuration5m + +=== Description + +This alert has severity level of "critical". +It's firing when more than 1% of the successful builds have their duration above 5 min. + +==== Troubleshooting + +* Inspect the successful Builds whose duration is longer than 5 minutes, e.g.: ++ +[source,sh] +---- +$ kubectl get builds.camel.apache.org -o json \ +| jq -r '.items[] +| select(.status.phase == "Succeeded") +| select(.status.duration + | "01-Jan-1970 \(sub("(?<time>.*)\\..*"; "\(.time)s"))" | strptime("%d-%b-%Y %Mm%Ss")? // strptime("%d-%b-%Y %Ss") + | mktime > 300) +| "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \ +| xargs kubectl describe +---- +Check the resource specification and events. + +* Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future. + +=== CamelKBuildError + +=== Description + +This alert has severity level of "critical". +It's firing when more than 1% of the builds have errored over at least 10 min. + +==== Troubleshooting + +* Inspect the errored Builds, e.g.: ++ +[source,sh] +---- +$ kubectl get builds.camel.apache.org -o json \ +| jq -r '.items[] +| select(.status.phase == "Error") +| "-n \(.metadata.namespace) builds.camel.apache.org/\(.metadata.name)"' \ +| xargs kubectl get -o jsonpath='{.metadata.namespace}{"/"}{.metadata.name}{"\nError: "}{.status.error}{"\n"}' +---- +Check the resource specification and events. * Improve this SOP if there's anything missing, and contact engineering if there are any changes they could make to make this easier in the future.
