Re: Performance findings on OpenWhisk

Dominic Kim Thu, 29 Mar 2018 09:57:06 -0700

Thank you for the answer Rodric, now I clearly got what could be worse with
that change other than file access.


But I am still curious, how could you meet your target TPS.
IIRC, though I create more stem cells, at maximum loads, container creation
and deletion are inevitable.
When docker daemon starts to create/delete container it causes huge
performance degradation because it only supports 1~10 TPS for container
creation and deletion.

In my setup, I can create maximum 320 concurrent containers, but got only
30~40 TPS with 100 actions.
Even with 10 actions, I got only 6K TPS.
It is quite less TPS considering my server's spec.

In production environment, there will be more number of users and actions,
I think reuse rate is less unless there are huge number of machines.

My approach has many security hole, but holding improvement of dockerd
performance in one hand and handling security issue with container reuse in
other hand, I am still inclined to second one.
I am not sure it is physically possible to improve dockerd performance to
four figures or more TPS.
And I have not observed any docker issues in long-term test with the
approach as it reduces docker daemon loads.

Since that is not enough in terms of TPS, I naively thought applying it in
conjunction with Tyson's PR to maximize TPS.

Anyway, how do you guys guarantee SLA wth current code base while serving
many users in public cloud?
Are you operating huge number of machines?


Thanks
Regards
Dominic



2018-03-29 19:11 GMT+09:00 Rodric Rabbah <rod...@gmail.com>:

> Thanks for the detailed post. Indeed as you’ve observed the container
> scheduler is why openwhisk is different from other faas out there which
> delegate to a container manager.
>
> Your approach of reusing a container across tenants could be achieved by
> creating more stem cells. In a typical production environment you can
> expect a significant amount of container reuse, which reduces the load on
> docker. Further as I think you’ve noted docker latency degrades with load
> and eventually can lock up the vm/host.
>
> In a private deployment of openwhisk you might be able to relax some of
> the security and isolation but I think it’s a nonstarter to reuse a
> container for more that one subject otherwise. The container can be
> entirely tainted where traffic is snooped, secrets stolen, or even the code
> itself.
>
> Note that the container isolation already in place serves another purpose:
> isolating bugs related to concurrency.  If your JavaScript code doesn’t
> properly handle asynchrony you might have two actions still running at once
> across different functions. So beyond security implications there will be
> performance isolation that’s also risked.
>
> Lastly the PR from Tyson which increases intra container concurrency
> (routing more than one request to the same container at the same time) is
> another approach to increasing density. It limits the reuse though in the
> same way we do today: subject and function.
>
> -r
>
> > On Mar 28, 2018, at 10:59 PM, Dominic Kim <style9...@gmail.com> wrote:
> >
> > Dear all.
> > 
> > I have tested OpenWhisk in terms of performance for last few months.
> > I want to share my experience on performance of OpenWhisk and discuss
> the right direction to go further.
> > I hope this would be a good starting point to improve performance and
> helps someone facing similar issue with me.
> >
> > As per my observation, there are three big parts which have performance
> issue, akka-http, couchdb, dockerd.
> > In this thread, I will only focus on dockerd.
> >
> > Since OpenWhisk can serve many different kinds of actions from different
> namespaces with different runtimes and container sizes, it would be more
> precise to assume many heterogenous requests come to OpenWhisk.
> > In this sense, to use a single action to measure performance of
> OpenWhisk may not be realistic in the production environment.
> > 
> > With current code base, if we run benchmark with many different number
> of actions, performance is severely degraded.
> > When I tested with 100 actions, I got about 30~40 TPS with 3 invoker
> machines with 40 cores, 128GB memory, 2TB ssd.
> > (There were 3 invoker containers on each hosts, so total 9 invokers were
> running)
> >
> > 
> >
> > There could be some differences based on deployment, configuration, the
> number of components and so on.
> > But I got about 20K TPS with same setup using 1 action, it seems obvious
> there is a huge performance degradation along different number of actions.
> >
> >
> > 
> > 
> >
> > After deep investigation, I found the main reason is reuse of containers.
> > When we use only 1 action, logically all containers are reused.
> > But if we use 100 actions, containers are not fully reused and deletion
> and creation of containers are occurred. (Contianers are reused only if
> namespace and action name are same.)
> > Since performance of docker daemon is not good, it causes TPS dropping.
> > 
> > When one of my colleagues did benchmark against Docker daemon directly,
> we observed only 30~40 TPS with `pause/unpause`.
> > (He also performed similar `pause/unpause` test with runc and only got
> about 300 TPS.)
> > When we included `run/rm` as well, TPS dropped to 1~10 TPS.
> >
> > It means if containers are not reused well, it makes dockerd
> create/remote/unpause/pause containers and TPS is dropped due to poor
> performance of docker daemon.
> > I think this issue exists in all serverless frameworks which are based
> on docker.
> >
> > So docker daemon is surely performance bottleneck in OpenWhisk, and it's
> not that simple to resolve it.
> > Even though huge performance improvement is made on docker daemon, I
> doubt it can support more than 1K ~ 2K TPS.
> > (Because docker is not designed to support this kind of traffic,
> creating about 1K containers in a second is not that plausible.)
> > 
> >
> > Then, in OpenWhisk, main key factor of performance is reuse of
> containers, in other word, reducing loads to docker daemon.
> > With 1 actions, there were not many docker traffics, invoker just called
> `/run` requests against running containers again and again.
> > However, with 100 actions, many containers should be deleted and
> created, this caused huge performance degradation.
> >
> >
> > Reuse rate: 95%
> >
> > In above graph, the red part in a circle is the ratio of reused(warmed)
> containers.
> > Reuse rate was more than 95%, but I got only 6K TPS.
> >
> > 5 namespaces + 2 actions each with 100 threads
> >
> >
> > As reuse rate is decreased, TPS is getting worse.
> >
> > Reuse rate: 60%
> >
> >
> >
> > 10 namespaces + 10 actions each with 400 threads
> >
> >
> >
> >
> > 
> > I changed `ContinerProxy` code to reuse container based on runtime and
> container memory size, and also changed `gracePause` time to 10 mins.
> > So once container is created, it is not paused in 10 minutes. No matter
> requests come from any namespaces, if runtime and memory size are same,
> containers are reused.
> > And I got about 20K TPS back with this changes.
> >
> > 100 actions with 740 threads
> >
> > 
> >
> >
> >
> > There could be some side effects and hidden defects of this
> implementation.
> > But anyway I got about 20K TPS with 100 actions.
> > This is because all 100 actions are nodejs6 actions and require same
> container memory size, accordingly all containers are reused.
> >
> > In current code, TPS is decreased as the number of actions or number of
> namespaces are increasing.
> > Since there is no limit on the number of namespaces and the number of
> actions in them, more and more namespace and actions are used, performance
> is getting worse.
> >
> > With this change, now container reuse rate only depends on runtime and
> container memory size.
> > OpenWhisk has 6 runtimes(nodejs6, nodejs8, python, php, swift, java), it
> can vastly increase reuse rate.
> > And even if more and more actions and namespaces are created, there is
> no performance degradation if same runtime is used.
> >
> >  One more good thing of this implementation is, it reduces the traffic
> against dockerd, I could run benchmark for long-term without any docker
> daemon problem under heavy loads.
> >
> > Since I am not that an expert on OpenWhisk, I want to discuss whether
> there could be any issues or side effects with this change.
> > With my shallow understanding on OpenWhisk, one security issue could
> happen when user accesses files in a container.
> > If user creates any files in the code, files could be exposed to other
> users as same container can be reused among many users.
> > (But I am not sure OpenWhisk should also guarantee such kinds of
> stateful approach because serverless is intrinsically stateless.)
> > Other than that, I have no idea currently.
> >
> >
> > Apart from my suggestion, I think one more thing worth to try is this
> one:
> > https://github.com/apache/incubator-openwhisk/pull/2795
> >
> >
> > I want to listen any opinions and feedback.
> >
> >
> > Thanks in advance
> > Regards
> > Dominic.
> >
> >
> >
>

Re: Performance findings on OpenWhisk

Reply via email to