2022-07-06 20:52:20 UTC - piby: Hey all, We are evaluating different serverless platforms for our k8s cluster. We have spent a couple of hours today trying to install openwhisk on EKS 1.20 but unfortunately weren't able to make it work.
There are limited logs and multiple containers are in “pod initializing” state with no way to debug it. Any help would be super useful to us. Thanks! values.yaml ```whisk: ingress: # NOTE: Replace <domain> with your cluster's actual domain apiHostName: <http://test.xxx.xxx.com|test.xxx.xxx.com> apiHostPort: 443 apiHostProto: https type: Standard useInternally: false # NOTE: Replace <domain> with your cluster's actual domain domain: <http://test.xxx.xxx.com|test.xxx.xxx.com> invoker: options: "-Dwhisk.kubernetes.user-pod-node-affinity.enabled=false" containerFactory: impl: kubernetes affinity: enabled: false toleration: enabled: false k8s: domain: cluster.local dns: kube-dns.kube-system persistence: enabled: true hasDefaultStorageClass: false explicitStorageClass: efs-csi-openwhisk metrics: # set true to enable prometheus exporter prometheusEnabled: true # passing prometheus-enabled by a config file, required by openwhisk whiskconfigFile: "whiskconfig.conf" # set true to enable Kamon kamonEnabled: false # set true to enable Kamon tags kamonTags: false # set true to enable user metrics userMetricsEnabled: true``` https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1657140740810409 ---- 2022-07-06 21:08:03 UTC - Bilal: I have a self managed Openwhisk deployment running in EKS (kube). Currently we are doing just over 100,000 activations per day. Hitting about a 0.5% system error rate with reponse code 3: Failed to run container. The majority of my actions are blackbox (I have blackbox percent set to 100%), however they are small docker files that simply extend existing OW python containers by installed a few more packages (eg `pip install redis`). At one point I had a 0% system error rate I've done most of the <https://github.com/apache/openwhisk-deploy-kube/blob/master/docs/k8s-custom-build-cluster-scaleup.md|recommendations here>, I assume at this point I'm Large scale. Linking values in :thread: At this point I'm not sure if there's an obvious config that I missed or if there are additional considerations at this scale? I have replicacount set to 4 for controller/invoker but only 1 for elasticsearch activationStoreBackend. Not sure if that should also be increased. https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1657141683129629?thread_ts=1657141683.129629&cid=C3TPCAQG1 ----