Re: concurrent requests on actions

Nick Mitchell Mon, 01 May 2017 15:24:41 -0700

won't this only be of benefit for invocations that are mostly sleepy? e.g.
I/O-bound? because if an action uses CPU flat-out, then there is no
throughput win to be had (by increasing the parallelism of CPU-bound
processes), given the small CPU sliver that each container gets -- unless
there is a concomitant increase in concurrency, i.e. CPU slice?


if so, then my gut tells me that there are more general solutions to this
(i.e. more efficient packing of I/O-bound processes)

On Mon, May 1, 2017 at 5:36 PM, Tyson Norris <tnor...@adobe.com> wrote:

> Thanks Markus.
>
> Can you direct me to the travis job where I can see the 40+RPS? I agree
> that is a big gap and would like to take a look - I didn’t see anything in
> https://travis-ci.org/openwhisk/openwhisk/builds/226918375 ; maybe I’m
> looking in the wrong place.
>
> I will work on putting together a PR to discuss.
>
> Thanks
> Tyson
>
>
> On May 1, 2017, at 2:22 PM, Markus Thömmes <markusthoem...@me.com<mailto:
> markusthoem...@me.com>> wrote:
>
> Hi Tyson,
>
> Sounds like you did a lot of investigation here, thanks a lot for that :)
>
> Seeing the numbers, 4 RPS in the "off" case seem very odd. The Travis
> build that runs the current system as is also reaches 40+ RPS. So we'd need
> to look at a mismatch here.
>
> Other than that I'd indeed suspect a great improvement in throughput from
> your work!
>
> Implementationwise I don't have a strong opionion but it might be worth to
> discuss the details first and land your impl. once all my staging is done
> (the open PRs). That'd ease git operation. If you want to discuss your
> impl. now I suggest you send a PR to my new-containerpool branch and share
> the diff here for discussion.
>
> Cheers,
> Markus
>
> Von meinem iPhone gesendet
>
> Am 01.05.2017 um 23:16 schrieb Tyson Norris <tnor...@adobe.com<mailto:tnor
> r...@adobe.com>>:
>
> Hi Michael -
> Concurrent requests would only reuse a running/warm container for
> same-action requests. So if the action has bad/rogue behavior, it will
> limit its own throughput only, not the throughput of other actions.
>
> This is ignoring the current implementation of the activation feed, which
> I guess is susceptible to a flood of slow running activations. If those
> activations are for the same action, running concurrently should be enough
> to not starve the system for other activations (with faster actions) to be
> processed. In case they are all different actions, OR not allowed to
> execute concurrently, then in the name of quality-of-service, it may also
> be desirable to reserve some resources (i.e. separate activation feeds) for
> known-to-be-faster actions, so that fast-running actions are not penalized
> for existing alongside the slow-running actions. This would require a more
> complicated throughput test to demonstrate.
>
> Thanks
> Tyson
>
>
>
>
>
>
>
> On May 1, 2017, at 1:13 PM, Michael Marth <mma...@adobe.com<mailto:mmart
> h...@adobe.com><mailto:mma...@adobe.com>> wrote:
>
> Hi Tyson,
>
> 10x more throughput, i.e. Being able to run OW at 1/10 of the cost -
> definitely worth looking into :)
>
> Like Rodric mentioned before I figured some features might become more
> complex to implement, like billing, log collection, etc. But given such a
> huge advancement on throughput that would be worth it IMHO.
> One thing I wonder about, though, is resilience against rogue actions. If
> an action is blocking (in the Node-sense, not the OW-sense), would that not
> block Node’s event loop and thus block other actions in that container? One
> could argue, though, that this rogue action would only block other
> executions of itself, not affect other actions or customers. WDYT?
>
> Michael
>
>
>
>
> On 01/05/17 17:54, "Tyson Norris" <tnor...@adobe.com<mailto:tnor
> r...@adobe.com><mailto:tnor...@adobe.com>> wrote:
>
> Hi All -
> I created this issue some time ago to discuss concurrent requests on
> actions: [1] Some people mentioned discussing on the mailing list so I
> wanted to start that discussion.
>
> I’ve been doing some testing against this branch with Markus’s work on the
> new container pool: [2]
> I believe there are a few open PRs in upstream related to this work, but
> this seemed like a reasonable place to test against a variety of the
> reactive invoker and pool changes - I’d be interested to hear if anyone
> disagrees.
>
> Recently I ran some tests
> - with “throughput.sh” in [3] using concurrency of 10 (it will also be
> interesting to test with the --rps option in loadtest...)
> - using a change that checks actions for an annotation “max-concurrent”
> (in case there is some reason actions want to enforce current behavior of
> strict serial invocation per container?)
> - when scheduling an actions against the pool, if there is a currently
> “busy” container with this action, AND the annotation is present for this
> action, AND concurrent requests < max-concurrent, the this container is
> used to invoke the action
>
> Below is a summary (approx 10x throughput with concurrent requests) and I
> would like to get some feedback on:
> - what are the cases for having actions that require container isolation
> per request? node is a good example that should NOT need this, but maybe
> there are cases where it is more important, e.g. if there are cases where
> stateful actions are used?
> - log collection approach: I have not attempted to resolve log collection
> issues; I would expect that revising the log sentinel marker to include the
> activation ID would help, and logs stored with the activation would include
> interleaved activations in some cases (which should be expected with
> concurrent request processing?), and require some different logic to
> process logs after an activation completes (e.g. logs emitted at the start
> of an activation may have already been collected as part of another
> activation’s log collection, etc).
> - advice on creating a PR to discuss this in more detail - should I wait
> for more of the container pooling changes to get to master? Or submit a PR
> to Markus’s new-containerpool branch?
>
> Thanks
> Tyson
>
> Summary of loadtest report with max-concurrent ENABLED (I used 10000, but
> this limit wasn’t reached):
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Target URL:
> https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2F192.168.99.100%2Fapi%2Fv1%2Fnamespaces%2F_%2Factions%
> 2FnoopThroughputConcurrent%3Fblocking%3Dtrue&data=02%7C01%7C%
> 7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178de
> cee1%7C0%7C0%7C636292663971484169&sdata=uv9kYh5uBoIDXDlEivgMClJ6TDGDmz
> TdKOgZPZjkBko%3D&reserved=0
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Max requests:        10000
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Concurrency level:   10
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Agent:
>  keepalive
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Completed requests:  10000
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Total errors:        0
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Total time:
> 241.900480915 s
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Requests per second: 41
> [Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Mean latency:        241.7
> ms
>
> Summary of loadtest report with max-concurrent DISABLED:
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Target URL:
> https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2F192.168.99.100%2Fapi%2Fv1%2Fnamespaces%2F_%
> 2Factions%2FnoopThroughput%3Fblocking%3Dtrue&data=02%7C01%7C%
> 7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178de
> cee1%7C0%7C0%7C636292663971494178&sdata=h6sMS0s2WQXFMcLg8sSAq%2F56p%
> 2F%2BmVmth%2B%2FsqTOVmeAc%3D&reserved=0
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Max requests:        10000
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Concurrency level:   10
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Agent:
>  keepalive
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Completed requests:  10000
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Total errors:        19
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Total time:
> 2770.658048791 s
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Requests per second: 4
> [Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Mean latency:        2767.3
> ms
>
>
>
>
>
> [1] https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fopenwhisk%2Fopenwhisk%
> 2Fissues%2F2026&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa%
> 7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=eg%
> 2FsSPRQYapQHPNbfMLCW%2B%2F1yAqn8zSo0nJ5yQjmkns%3D&reserved=0
> [2] https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fmarkusthoemmes%2Fopenwhisk%
> 2Ftree%2Fnew-containerpool&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce
> 7faa%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%
> 7C636292663971494178&sdata=IZcN9szW71SdL%2ByssJm9k3EgzaU4b5idI5yFWyR7%
> 2BL4%3D&reserved=0
> [3] https://na01.safelinks.protection.outlook.com/?url=
> https%3A%2F%2Fgithub.com%2Fmarkusthoemmes%2Fopenwhisk-
> performance&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa%
> 7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=
> WkOlhTsplKQm6mUkZtwWLXzCrQg%2FUmKtqOErIw6gFAA%3D&reserved=0
>
>
>

Re: concurrent requests on actions

Reply via email to