Hi all,
I’ve been lurking a bit on this thread, but haven’t had time to fully digest
all the issues.
I’d suggest that the first step is to support “multiple heterogeneous resource
pools”, where a resource pool is a set of invokers managed by a load balancer.
There are lots of reasons we may want to support invokers with different
flavors: long-running actions, invokers in a VPN, invokers with GPUs,
invokers with big memory, invokers which support concurrent execution, etc… .
If we had a general way to plug in a new resource pool, folks could feel
free to experiment with any new flavors they like without having to debate the
implications on other flavors.
I tend to doubt that there is a “one size fits all” solution here, so I’d
suggest we bite the bullet and engineer for heterogeneity.
SJF
> On Jul 4, 2017, at 9:55 AM, Michael Marth <[email protected]> wrote:
>
> Hi Jeremias, all,
>
> Tyson and Dragos are travelling this week, so that I don’t know by when they
> get to respond. I have worked with them on this topic, so let me jump in and
> comment until they are able to reply.
>
> From my POV having a call like you suggest is a really good idea. Let’s wait
> for Tyson & Dragos to chime in to find a date.
>
> As you mention the discussion so far was jumping across different topics,
> especially the use case, the problem to be solved and the proposed solution.
> In preparation of the call I think we can clarify use case and problem on the
> list. Here’s my view:
>
> Use Case
>
> For us the use case can be summarised with “dynamic, high performance
> websites/mobile apps”. This implies:
> 1 High concurrency, i.e. Many requests coming in at the same time
> 2 The code to be executed is the same code across these different requests
> (as opposed to a long tail distribution of many different actions being
> executed concurrently). In our case “many” would mean “hundreds” or a few
> thousand.
> 3 The latency (time to start execution) matters, because human users are
> waiting for the response. Ideally, in these order of magnitudes of concurrent
> requests the latency should not change much.
>
> All 3 requirements need to be satisfied for this use case.
> In the discussion so far it was mentioned that there are other use cases
> which might have similar requirements. That’s great and I do not want to rule
> them out, obviously. The above is just to make it clear from where we are
> coming from.
>
> At this point I would like to mention that it is my understanding that this
> use case is within OpenWhisk’s strike zone, i.e. Something that we all think
> is reasonable to support. Please speak up if you disagree.
>
> The Problem
>
> One can look at the problem in two ways:
> Either you keep the resources of the OW system constant (i.e. No scaling). In
> that case latency increases very quickly as demonstrated by Tyson’s tests.
> Or you increase the system’s capacity. In that case the amount of machines to
> satisfy this use case quickly becomes prohibitively expensive to run for the
> OW operator – where expensive is defined as “compared to traditional web
> servers” (in our case a standard Node.js server). Meaning, you need 100-1000
> concurrent action containers to serve what can be served by 1 or 2 Node.js
> containers.
>
> Of course, the proposed solution is not a fundamental “fix” for the above. It
> would only move the needle ~2 orders of magnitude – so that the current
> problem would not be a problem in reality anymore (and simply remain as a
> theoretical problem). For me that would be good enough.
>
> The solution approach
>
> Would not like to comment on the proposed solution’s details (and leave that
> to Dragos and Tyson). However, it was mentioned that the approach would
> change the programming model for users:
> Our mindset and approach was that we explicitly do not want to change how
> OpenWhisk exposes itself to users. Meaning, users should still be able to use
> NPMs, etc - i.e. This would be an internal implementation detail that is not
> visible for users. (we can make things more explicit to users and e.g. Have
> them requests a special concurrent runtime if we wish to do so – so far we
> tried to make it transparent to users, though).
>
> Many thanks
> Michael
>
>
>
> On 03/07/17 14:48, "Jeremias Werner"
> <[email protected]<mailto:[email protected]>> wrote:
>
> Hi
>
> Thanks for the write-up and the proposal. I think this is a nice idea and
> sounds like a nice way of increasing throughput. Reading through the thread
> it feels like there are different topics/problems mixed-up and the
> discussion is becoming very complex already.
>
> Therefore I would like to suggest that we streamline the discussion a bit,
> maybe in a zoom.us session where we first give Tyson and Dragos the chance
> to walk through the proposal and clarify questions of the audience. Once we
> are all on the same page we could think of a discussion about the benefits
> (improved throughput, latency) vs. challanges (resource sharing, crash
> model, container lifetime, programming model) on the core of the proposal:
> running multiple activations in a single user container. Once we have a
> common understanding on that part we could step-up in the architecture and
> discuss what's needed on higher components like invoker/load-balancer to
> get this integrated.
>
> (I said zoom.us session since I liked the one we had a few weeks ago. It
> was efficient and interactive. If you like I could volunteer to setup the
> session and/or writing the script/summary)
>
> what do you think?
>
> Many thanks in advance!
>
> Jeremias
>
>
> On Sun, Jul 2, 2017 at 5:43 PM, Rodric Rabbah
> <[email protected]<mailto:[email protected]>> wrote:
>
> You're discounting with event driven all use cases that are still latency
> sensitive because they complete a response by call back or actuation at
> completion. IoT, chatbots, notifications, all examples in addition to ui
> which are latency sensitive and having uniform expectations on queuing time
> is of value.
>
> -r
>