Why not considering giving developers options to control the level of
concurrency, instead of deciding on their behalf ? I think that cases such
as the ones Tyson is mentioning make sense; unless we build something that
will estimate the resources needed by an action automatically, letting the
developer specify it instead, might be a mean of "supervised learning" that
the system can use further in order to make decisions at runtime.

Dragos
On Mon, May 1, 2017 at 4:46 PM Tyson Norris <tnor...@adobe.com> wrote:

> Sure, many of our use cases are mostly stitching together API calls, as
> opposed to being CPU bound - consider a simple javascript action that wraps
> a downstream http API (or many APIs).
>
> What do you mean by “more efficient packing of I/O-bound processes”? For
> example, in the case of actions that wrap an API call, typically the action
> developer is NOT the owner of the API call, so its not clear how to handle
> this more efficiently than by creating a nodejs action that proxies
> (multiple concurrent) network requests around, but does little actual
> computing besides possibly some minor request/response parsing etc. In our
> cases we our much more likely to run into bottlenecks with concurrent users
> without any concurrent container usage support, unless we greatly over
> provision clusters which will provide drastic reduction in efficiency. It
> is much simpler to provision for anticipated or immediate load changes when
> each new container can support multiple concurrent requests, instead of
> each new container supporting a single request.
>
> More tests demonstrating these cases (e.g. API wrappers, and
> compute-centric actions) will help this discussion, I’ll work on providing
> those.
>
> Thanks
> Tyson
>
> > On May 1, 2017, at 3:24 PM, Nick Mitchell <moose...@gmail.com> wrote:
> >
> > won't this only be of benefit for invocations that are mostly sleepy?
> e.g.
> > I/O-bound? because if an action uses CPU flat-out, then there is no
> > throughput win to be had (by increasing the parallelism of CPU-bound
> > processes), given the small CPU sliver that each container gets -- unless
> > there is a concomitant increase in concurrency, i.e. CPU slice?
> >
> > if so, then my gut tells me that there are more general solutions to this
> > (i.e. more efficient packing of I/O-bound processes)
> >
> > On Mon, May 1, 2017 at 5:36 PM, Tyson Norris <tnor...@adobe.com> wrote:
> >
> >> Thanks Markus.
> >>
> >> Can you direct me to the travis job where I can see the 40+RPS? I agree
> >> that is a big gap and would like to take a look - I didn’t see anything
> in
> >>
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftravis-ci.org%2Fopenwhisk%2Fopenwhisk%2Fbuilds%2F226918375&data=02%7C01%7C%7C8a29a490bc6545d4460408d490e0c979%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292742509382993&sdata=2RiV65g7zvR07ditlzosUxsrWvQIo8WfpMvr7g2JHWY%3D&reserved=0
> ; maybe I’m
> >> looking in the wrong place.
> >>
> >> I will work on putting together a PR to discuss.
> >>
> >> Thanks
> >> Tyson
> >>
> >>
> >> On May 1, 2017, at 2:22 PM, Markus Thömmes <markusthoem...@me.com
> <mailto:
> >> markusthoem...@me.com>> wrote:
> >>
> >> Hi Tyson,
> >>
> >> Sounds like you did a lot of investigation here, thanks a lot for that
> :)
> >>
> >> Seeing the numbers, 4 RPS in the "off" case seem very odd. The Travis
> >> build that runs the current system as is also reaches 40+ RPS. So we'd
> need
> >> to look at a mismatch here.
> >>
> >> Other than that I'd indeed suspect a great improvement in throughput
> from
> >> your work!
> >>
> >> Implementationwise I don't have a strong opionion but it might be worth
> to
> >> discuss the details first and land your impl. once all my staging is
> done
> >> (the open PRs). That'd ease git operation. If you want to discuss your
> >> impl. now I suggest you send a PR to my new-containerpool branch and
> share
> >> the diff here for discussion.
> >>
> >> Cheers,
> >> Markus
> >>
> >> Von meinem iPhone gesendet
> >>
> >> Am 01.05.2017 um 23:16 schrieb Tyson Norris <tnor...@adobe.com<mailto:
> tnor
> >> r...@adobe.com>>:
> >>
> >> Hi Michael -
> >> Concurrent requests would only reuse a running/warm container for
> >> same-action requests. So if the action has bad/rogue behavior, it will
> >> limit its own throughput only, not the throughput of other actions.
> >>
> >> This is ignoring the current implementation of the activation feed,
> which
> >> I guess is susceptible to a flood of slow running activations. If those
> >> activations are for the same action, running concurrently should be
> enough
> >> to not starve the system for other activations (with faster actions) to
> be
> >> processed. In case they are all different actions, OR not allowed to
> >> execute concurrently, then in the name of quality-of-service, it may
> also
> >> be desirable to reserve some resources (i.e. separate activation feeds)
> for
> >> known-to-be-faster actions, so that fast-running actions are not
> penalized
> >> for existing alongside the slow-running actions. This would require a
> more
> >> complicated throughput test to demonstrate.
> >>
> >> Thanks
> >> Tyson
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On May 1, 2017, at 1:13 PM, Michael Marth <mma...@adobe.com<mailto:
> mmart
> >> h...@adobe.com><mailto:mma...@adobe.com>> wrote:
> >>
> >> Hi Tyson,
> >>
> >> 10x more throughput, i.e. Being able to run OW at 1/10 of the cost -
> >> definitely worth looking into :)
> >>
> >> Like Rodric mentioned before I figured some features might become more
> >> complex to implement, like billing, log collection, etc. But given such
> a
> >> huge advancement on throughput that would be worth it IMHO.
> >> One thing I wonder about, though, is resilience against rogue actions.
> If
> >> an action is blocking (in the Node-sense, not the OW-sense), would that
> not
> >> block Node’s event loop and thus block other actions in that container?
> One
> >> could argue, though, that this rogue action would only block other
> >> executions of itself, not affect other actions or customers. WDYT?
> >>
> >> Michael
> >>
> >>
> >>
> >>
> >> On 01/05/17 17:54, "Tyson Norris" <tnor...@adobe.com<mailto:tnor
> >> r...@adobe.com><mailto:tnor...@adobe.com>> wrote:
> >>
> >> Hi All -
> >> I created this issue some time ago to discuss concurrent requests on
> >> actions: [1] Some people mentioned discussing on the mailing list so I
> >> wanted to start that discussion.
> >>
> >> I’ve been doing some testing against this branch with Markus’s work on
> the
> >> new container pool: [2]
> >> I believe there are a few open PRs in upstream related to this work, but
> >> this seemed like a reasonable place to test against a variety of the
> >> reactive invoker and pool changes - I’d be interested to hear if anyone
> >> disagrees.
> >>
> >> Recently I ran some tests
> >> - with “throughput.sh” in [3] using concurrency of 10 (it will also be
> >> interesting to test with the --rps option in loadtest...)
> >> - using a change that checks actions for an annotation “max-concurrent”
> >> (in case there is some reason actions want to enforce current behavior
> of
> >> strict serial invocation per container?)
> >> - when scheduling an actions against the pool, if there is a currently
> >> “busy” container with this action, AND the annotation is present for
> this
> >> action, AND concurrent requests < max-concurrent, the this container is
> >> used to invoke the action
> >>
> >> Below is a summary (approx 10x throughput with concurrent requests) and
> I
> >> would like to get some feedback on:
> >> - what are the cases for having actions that require container isolation
> >> per request? node is a good example that should NOT need this, but maybe
> >> there are cases where it is more important, e.g. if there are cases
> where
> >> stateful actions are used?
> >> - log collection approach: I have not attempted to resolve log
> collection
> >> issues; I would expect that revising the log sentinel marker to include
> the
> >> activation ID would help, and logs stored with the activation would
> include
> >> interleaved activations in some cases (which should be expected with
> >> concurrent request processing?), and require some different logic to
> >> process logs after an activation completes (e.g. logs emitted at the
> start
> >> of an activation may have already been collected as part of another
> >> activation’s log collection, etc).
> >> - advice on creating a PR to discuss this in more detail - should I wait
> >> for more of the container pooling changes to get to master? Or submit a
> PR
> >> to Markus’s new-containerpool branch?
> >>
> >> Thanks
> >> Tyson
> >>
> >> Summary of loadtest report with max-concurrent ENABLED (I used 10000,
> but
> >> this limit wasn’t reached):
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Target URL:
> >> https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2F192.168.99.100%2Fapi%2Fv1%2Fnamespaces%2F_%2Factions%
> >> 2FnoopThroughputConcurrent%3Fblocking%3Dtrue&data=02%7C01%7C%
> >> 7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178de
> >> cee1%7C0%7C0%7C636292663971484169&sdata=uv9kYh5uBoIDXDlEivgMClJ6TDGDmz
> >> TdKOgZPZjkBko%3D&reserved=0
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Max requests:
> 10000
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Concurrency level:   10
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Agent:
> >> keepalive
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Completed requests:
> 10000
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Total errors:        0
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Total time:
> >> 241.900480915 s
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Requests per second: 41
> >> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Mean latency:
> 241.7
> >> ms
> >>
> >> Summary of loadtest report with max-concurrent DISABLED:
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Target URL:
> >> https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2F192.168.99.100%2Fapi%2Fv1%2Fnamespaces%2F_%
> >> 2Factions%2FnoopThroughput%3Fblocking%3Dtrue&data=02%7C01%7C%
> >> 7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178de
> >> cee1%7C0%7C0%7C636292663971494178&sdata=h6sMS0s2WQXFMcLg8sSAq%2F56p%
> >> 2F%2BmVmth%2B%2FsqTOVmeAc%3D&reserved=0
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Max requests:
> 10000
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Concurrency level:   10
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Agent:
> >> keepalive
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Completed requests:
> 10000
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Total errors:        19
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Total time:
> >> 2770.658048791 s
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Requests per second: 4
> >> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Mean latency:
> 2767.3
> >> ms
> >>
> >>
> >>
> >>
> >>
> >> [1] https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fgithub.com%2Fopenwhisk%2Fopenwhisk%
> >> 2Fissues%2F2026&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa%
> >>
> 7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=eg%
> >> 2FsSPRQYapQHPNbfMLCW%2B%2F1yAqn8zSo0nJ5yQjmkns%3D&reserved=0
> >> [2] https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fgithub.com%2Fmarkusthoemmes%2Fopenwhisk%
> >>
> 2Ftree%2Fnew-containerpool&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce
> >> 7faa%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%
> >> 7C636292663971494178&sdata=IZcN9szW71SdL%2ByssJm9k3EgzaU4b5idI5yFWyR7%
> >> 2BL4%3D&reserved=0
> >> [3] https://na01.safelinks.protection.outlook.com/?url=
> >> https%3A%2F%2Fgithub.com%2Fmarkusthoemmes%2Fopenwhisk-
> >> performance&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa%
> >> 7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=
> >> WkOlhTsplKQm6mUkZtwWLXzCrQg%2FUmKtqOErIw6gFAA%3D&reserved=0
> >>
> >>
> >>
>
>

Reply via email to