singhsegv opened a new issue, #5510:
URL: https://github.com/apache/openwhisk/issues/5510

   I am very confused about how the OpenWhisk 2.0.0 is meant to be deployed for 
a scalable bechmarking setting. Need some help from the maintainers to 
understand what am I missing since I've spent a large amount of time now and 
still missing some key pieces.
   
   ### Context
   We are using OpenWhisk for a research project where workflows (sequential as 
well as fork/join) are to be deployed and benchmarked at 1/4/8 RPS etc for long 
period of times. This is to compare private cloud FaaS vs public cloud FaaS.
   
   ### Current Infrastructure Setting
   We have a in-house cluster with around 10 VMs running on different nodes, 50 
vCPUs and around 200Gb of memory. Since I am new to this, I've initially 
followed https://github.com/apache/openwhisk-deploy-kube to deploy it and along 
with OpenWhisk Composer, was able to get the workflows running with a lot of 
small fixes and changes.
   
   ### Problems with Current Infrastructure
   1. I am not able to scale it properly. Running even 1 RPS for 5-10 minutes 
leads to a lot of random errors like "failed to get binary" and some other 
errors too that don't occur when running a workflow once manually.
   2. Even when I reduce the benchmarking time to 10-20s, the inter-action comm 
time is coming out to be around 1.5-2 minutes. Grafana shows around 2 minutes 
going in the `/init` and I am unable to debug why that is happening.
   
   ### Main Doubts about scaling
   1. Since that openwhisk-deploy-kube take a very old version of OpenWhisk, so 
I thought running the latest version of it without k8s and on a single machine 
might give some benefits. But what I've understood now is the standalone mode 
is not supposed to be scalable since the controller is responsible for a lot of 
things in v1.0.0 and haven't checked that in 2.0.0.
   2. Since, deploy-kube doesn't support the latest version of OpenWhisk due to 
major changes in scheduler, how is OpenWhisk supposed to be deployed for a 
scalable infrastructure? Is there some documentation that I've missed?
   3. Also in 1.0.0, the results that I've got, is there something that I am 
missing? Why aren't the workflows scaling? How to go about debugging the delay? 
Or is purely that more infra needs to be added? 
   
   @style95 @dgrove-oss Since you people have been active in the community and 
have answered some of my previous queries too, any help on this will be very 
appreciated. 
   
   We are planning to go all in with OpenWhisk for our research and planning to 
contribute some good changes back to the community relating to FaaS at edge and 
improving the communication times in FaaS. But since none of us have 
infrastructure as our strong suite, getting over these initial hiccups is a 
becoming a blocker for us. So looking forward to some help, thanks :).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to