Thanks Markus.

Can you direct me to the travis job where I can see the 40+RPS? I agree that is 
a big gap and would like to take a look - I didn’t see anything in 
https://travis-ci.org/openwhisk/openwhisk/builds/226918375 ; maybe I’m looking 
in the wrong place.

I will work on putting together a PR to discuss.

Thanks
Tyson


On May 1, 2017, at 2:22 PM, Markus Thömmes 
<markusthoem...@me.com<mailto:markusthoem...@me.com>> wrote:

Hi Tyson,

Sounds like you did a lot of investigation here, thanks a lot for that :)

Seeing the numbers, 4 RPS in the "off" case seem very odd. The Travis build 
that runs the current system as is also reaches 40+ RPS. So we'd need to look 
at a mismatch here.

Other than that I'd indeed suspect a great improvement in throughput from your 
work!

Implementationwise I don't have a strong opionion but it might be worth to 
discuss the details first and land your impl. once all my staging is done (the 
open PRs). That'd ease git operation. If you want to discuss your impl. now I 
suggest you send a PR to my new-containerpool branch and share the diff here 
for discussion.

Cheers,
Markus

Von meinem iPhone gesendet

Am 01.05.2017 um 23:16 schrieb Tyson Norris 
<tnor...@adobe.com<mailto:tnor...@adobe.com>>:

Hi Michael -
Concurrent requests would only reuse a running/warm container for same-action 
requests. So if the action has bad/rogue behavior, it will limit its own 
throughput only, not the throughput of other actions.

This is ignoring the current implementation of the activation feed, which I 
guess is susceptible to a flood of slow running activations. If those 
activations are for the same action, running concurrently should be enough to 
not starve the system for other activations (with faster actions) to be 
processed. In case they are all different actions, OR not allowed to execute 
concurrently, then in the name of quality-of-service, it may also be desirable 
to reserve some resources (i.e. separate activation feeds) for 
known-to-be-faster actions, so that fast-running actions are not penalized for 
existing alongside the slow-running actions. This would require a more 
complicated throughput test to demonstrate.

Thanks
Tyson







On May 1, 2017, at 1:13 PM, Michael Marth 
<mma...@adobe.com<mailto:mma...@adobe.com><mailto:mma...@adobe.com>> wrote:

Hi Tyson,

10x more throughput, i.e. Being able to run OW at 1/10 of the cost - definitely 
worth looking into :)

Like Rodric mentioned before I figured some features might become more complex 
to implement, like billing, log collection, etc. But given such a huge 
advancement on throughput that would be worth it IMHO.
One thing I wonder about, though, is resilience against rogue actions. If an 
action is blocking (in the Node-sense, not the OW-sense), would that not block 
Node’s event loop and thus block other actions in that container? One could 
argue, though, that this rogue action would only block other executions of 
itself, not affect other actions or customers. WDYT?

Michael




On 01/05/17 17:54, "Tyson Norris" 
<tnor...@adobe.com<mailto:tnor...@adobe.com><mailto:tnor...@adobe.com>> wrote:

Hi All -
I created this issue some time ago to discuss concurrent requests on actions: 
[1] Some people mentioned discussing on the mailing list so I wanted to start 
that discussion.

I’ve been doing some testing against this branch with Markus’s work on the new 
container pool: [2]
I believe there are a few open PRs in upstream related to this work, but this 
seemed like a reasonable place to test against a variety of the reactive 
invoker and pool changes - I’d be interested to hear if anyone disagrees.

Recently I ran some tests
- with “throughput.sh” in [3] using concurrency of 10 (it will also be 
interesting to test with the --rps option in loadtest...)
- using a change that checks actions for an annotation “max-concurrent” (in 
case there is some reason actions want to enforce current behavior of strict 
serial invocation per container?)
- when scheduling an actions against the pool, if there is a currently “busy” 
container with this action, AND the annotation is present for this action, AND 
concurrent requests < max-concurrent, the this container is used to invoke the 
action

Below is a summary (approx 10x throughput with concurrent requests) and I would 
like to get some feedback on:
- what are the cases for having actions that require container isolation per 
request? node is a good example that should NOT need this, but maybe there are 
cases where it is more important, e.g. if there are cases where stateful 
actions are used?
- log collection approach: I have not attempted to resolve log collection 
issues; I would expect that revising the log sentinel marker to include the 
activation ID would help, and logs stored with the activation would include 
interleaved activations in some cases (which should be expected with concurrent 
request processing?), and require some different logic to process logs after an 
activation completes (e.g. logs emitted at the start of an activation may have 
already been collected as part of another activation’s log collection, etc).
- advice on creating a PR to discuss this in more detail - should I wait for 
more of the container pooling changes to get to master? Or submit a PR to 
Markus’s new-containerpool branch?

Thanks
Tyson

Summary of loadtest report with max-concurrent ENABLED (I used 10000, but this 
limit wasn’t reached):
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Target URL:          
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F192.168.99.100%2Fapi%2Fv1%2Fnamespaces%2F_%2Factions%2FnoopThroughputConcurrent%3Fblocking%3Dtrue&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971484169&sdata=uv9kYh5uBoIDXDlEivgMClJ6TDGDmzTdKOgZPZjkBko%3D&reserved=0
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Max requests:        10000
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Concurrency level:   10
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Agent:               keepalive
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Completed requests:  10000
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Total errors:        0
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Total time:          
241.900480915 s
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Requests per second: 41
[Sat Apr 29 2017 16:32:37 GMT+0000 (UTC)] INFO Mean latency:        241.7 ms

Summary of loadtest report with max-concurrent DISABLED:
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Target URL:          
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2F192.168.99.100%2Fapi%2Fv1%2Fnamespaces%2F_%2Factions%2FnoopThroughput%3Fblocking%3Dtrue&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=h6sMS0s2WQXFMcLg8sSAq%2F56p%2F%2BmVmth%2B%2FsqTOVmeAc%3D&reserved=0
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Max requests:        10000
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Concurrency level:   10
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Agent:               keepalive
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Completed requests:  10000
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Total errors:        19
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Total time:          
2770.658048791 s
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Requests per second: 4
[Sat Apr 29 2017 19:21:51 GMT+0000 (UTC)] INFO Mean latency:        2767.3 ms





[1] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenwhisk%2Fopenwhisk%2Fissues%2F2026&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=eg%2FsSPRQYapQHPNbfMLCW%2B%2F1yAqn8zSo0nJ5yQjmkns%3D&reserved=0
[2] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmarkusthoemmes%2Fopenwhisk%2Ftree%2Fnew-containerpool&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=IZcN9szW71SdL%2ByssJm9k3EgzaU4b5idI5yFWyR7%2BL4%3D&reserved=0
[3] 
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmarkusthoemmes%2Fopenwhisk-performance&data=02%7C01%7C%7C796dfc317cde44c9e83908d490ce7faa%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C636292663971494178&sdata=WkOlhTsplKQm6mUkZtwWLXzCrQg%2FUmKtqOErIw6gFAA%3D&reserved=0


Reply via email to