[slack-digest] [2020-01-11] #general

OpenWhisk Team Slack Sun, 12 Jan 2020 01:46:54 -0800

2020-01-11 01:37:41 UTC - Ali Tariq: In order to further investigate, i plotted 
the id of successful requests. (every request has a unique id - which were 
assigned linearly) - as you can see there is a big continuous chunk that is 
dropped altogether which signifies the whole queue was dropped. Could it be an 
implementation bug?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578706661059000?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 01:40:26 UTC - Rodric Rabbah: whoa these are neat graphs - give me a 
sec to process
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578706826059400?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 01:42:19 UTC - Rodric Rabbah: whats the x-axis? time?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578706939059600?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 01:44:27 UTC - Rodric Rabbah: i dont immediately see why the 
requests would get dropped unless they were 429 and never accepted in the first 
place - are you checking the status codes of the invokes?


also, you can check the kafka queue length for correlation as it should report 
the invoker queue lenghts
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578707067059900?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 01:52:00 UTC - Ali Tariq: In the first graph ... x-axis is the time, 
in the second graph we dont really need x-axis (although its also time-axis 
just not marked), because the requests were assigned numbers as there were 
launched (starting from 0 till last request 100k)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578707520060100?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 01:53:13 UTC - Ali Tariq: Yeah ... there are a few 429s as well, but 
most of the ones that get queued up, dont return 429 and ultimately time out 
(apiGateway timeout).
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578707593060300?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 01:53:51 UTC - Ali Tariq: How do i check the kafka queue Length?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578707631060500?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:23:06 UTC - Rodric Rabbah: You have to enable Kamon I think 
@chetanm ?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578712986067800?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:23:40 UTC - Rodric Rabbah: Timing out api gw is ok in that the 
request is still in the system. 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713020068500?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:25:12 UTC - Rodric Rabbah: I don’t know how to explain that dip 
in the first graph. Assuming your load generator is not the issue if you’re 
sustaining load the curve should stay at the 1K max
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713112070400?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:25:34 UTC - Rodric Rabbah: Until you stop at which point it 
drains the queue until it’s empty 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713134071100?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:25:39 UTC - Ali Tariq: yes ... but the big missing chunk in the 
middle shows that the whole queue (&amp; all queued requests) got dropped 
somehow - possible crash maybe
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713139071400?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:25:42 UTC - Rodric Rabbah: Did anything die? 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713142071700?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:25:47 UTC - Rodric Rabbah: Invoker? 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713147072000?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:26:24 UTC - Ali Tariq: I am positive issue is not on workload 
generator because i have tested the same workload on various serverless 
platforms
+1 : Rodric Rabbah
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713184072900?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:26:37 UTC - Rodric Rabbah: Once a request is in kafka it’s 
persisted 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713197073300?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:27:14 UTC - Rodric Rabbah: And each invoker will only buffer a 
limited look ahead so the loss on an invoker going down is limited. Wouldn’t 
explain 18k
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713234074600?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:27:35 UTC - Ali Tariq: thats true!
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713255074800?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:28:38 UTC - Rodric Rabbah: Something is off - we run 10k/s 
regularly as part of a perf sniff test and haven’t seen something like this. 
Thinking...
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713318076100?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:28:59 UTC - Ali Tariq: I want to add ... this is not an anomalous 
result, i have run this test 4 times, and it occurred every single time
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713339076300?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:29:38 UTC - Rodric Rabbah: How are you getting the data for the 
first graph? 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713378076900?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:30:07 UTC - Ali Tariq: It must have something to do with the 
`actionInvokesConcurrent` (set at 11000) and `containerMemoryPool` (set at 1000 
containers).
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713407077400?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:30:43 UTC - Rodric Rabbah: Container memory pool I believe limits 
num of containers per invoker.   
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713443078400?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:31:00 UTC - Ali Tariq: i am running my custom logging server that 
collects the data from inside the running functions (http request packets)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713460079100?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:31:18 UTC - Rodric Rabbah: Concurrent Invokers is how much 
capacity a single namespace can allocated. The ratio is how many invokes a 
single namespace monopolizes 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713478079800?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:32:05 UTC - Ali Tariq: i have set 50 containers 
(`containerMemoryPool`) per invoker and i have 20 invokers in the deployment
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713525080900?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:32:06 UTC - Rodric Rabbah: Have you looked at how many activation 
records are in the db to see that nothing was dropped or how many are dropped?


https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713526081100?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 03:36:10 UTC - Ali Tariq: I did not for the last run, i will in the 
next run. But if a request didn't run, there wouldn't be any activation record 
for that request in the db! (for example 429s ... or the above dip for that 
matter)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578713770081800?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 04:21:14 UTC - Ali Tariq: In Openwhisk (consider we only have a 
single namespace, default), we have `actionInvokesConcurrent` which limits how 
many active requests can be in the system (`actionInvokesConcurrent` = 
running+queuedUp). Then we have actual `containerMemoryPool` which signifies 
the limit on parallel actions running. What is the best practices for these 
configurations? i know  `actionInvokesConcurrent` &gt;= `containerMemoryPool` 
(should be) otherwise none of the requests will ever get queued up (absorb 
burst workloads) and everything above `actionInvokesConcurrent` would get 
statusCode `429` , but what should be optimal case ratio/values? Also, doesn't 
the difference between `actionInvokesConcurrent` and `containerMemoryPool` kind 
of decides the queue length for the system (the only namespace)?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578716474083100
----
2020-01-11 04:28:02 UTC - Ali Tariq: okay, thanks to our discussion - i went 
over the request return codes this time! it turns out, out of the ~21k drops 
this time - about 9k drops are due to 429s and 12k drop are due to 503. The 
`system is overloaded or down for maintenance` . Currently trying to debug what 
caused the invokers become unhealthy (have 20 invoker and for some reason all 
became unhealthy!) But this surely explains the dip.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578716882083300?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 05:18:57 UTC - Ali Tariq: Looking at the logs from `controller` ... 
all 20 `invokers` were healthy after which one by one, all became unresponsive
```[2020-01-11T03:44:31.058Z] [INFO] [#tid_sid_invokerHealth] [InvokerPool] 
invoker status changed to 0 -&gt; Healthy, 1 -&gt; Healthy, 2 -&gt; Healthy, 3 
-&gt; Healthy, 4 -&gt; Healthy, 5 -&gt; Healthy, 6 -&gt; Healthy, 7 -&gt; 
Healthy, 8 -&gt; Healthy, 9 -&gt; Healthy, 10 -&gt; Healthy, 11 -&gt; Healthy, 
12 -&gt; Healthy, 13 -&gt; Healthy, 14 -&gt; Healthy, 15 -&gt; Healthy, 16 
-&gt; Healthy, 17 -&gt; Healthy, 18 -&gt; Healthy, 19 -&gt; Healthy```
to messages like these
```[2020-01-11T03:52:37.630Z] [INFO] [#tid_sid_invokerHealth] [InvokerActor] 
invoker18 is unresponsive 
[marker:loadbalancer_invokerState.unresponsive_counter:512082]```
Looking further in controller, the main issue seems to be
```[2020-01-11T03:56:38.627Z] [ERROR] [#tid_8Imd7UIW3OoaJhdAt6OMJO7IuIUPB2No] 
[ShardingContainerPoolBalancer] failed to schedule activation 
26ebd0b5cbca44cfabd0b5cbca64cf5f, action 'guest/lambda1@0.0.1' (blackbox), ns 
'guest' - invokers to use: Map(Unresponsive -&gt; 20)```
Finding the reason for above issue in `invoker` logs at about the same 
timestamp, i found
```[2020-01-11T03:56:38.187Z] [ERROR] [#tid_eIPUPBXTR3TN0joUTclIpPt5NF6rOpIx] 
[ContainerPool] Rescheduling Run message, too many message in the pool, 
freePoolSize: 0 containers and 0 MB, busyPoolSize: 50 containers and 12800 MB, 
maxContainersMemory 12800 MB, userNamespace: guest, action: 
ExecutableWhiskAction/guest/lambda1@0.0.1, needed memory: 256 MB, waiting 
messages: 49```
I understand the error, it wants to create new containers incoming requests but 
there is no space in the `containerMemoryPool` . In such cases, shouldn't it 
simply queue the request? - why would it cause invokers to become unresponsive!
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578719937083500?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 05:52:01 UTC - Ali Tariq: So in short, if you follow the first 
graph, it started serving the requests at time 0. At 10s mark, serving hits the 
concurrency limit of `1000`, at 280s mark, we start receiving `429` s (queue 
lenth at this point should be approximately 9-10k). By the 617s mark, we had 
received 9k `429` s ,  after which we start getting `503` s instead of `429` s. 
After sometime, invoker come back up again and everything remaining finishes 
fine. Only thing left to explain is why did invokers become unhealthy? Could it 
be, because they had been sending too many `429` s, the invokers got overloaded 
(but if i am not wrong- controller sends `429`s)?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578721921084100?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 05:52:25 UTC - Ali Tariq: Also, what is `busyPoolSize` &amp; 
`waiting messages` ? (from invoker log message) - i found the code in 
`ContainerPool.scala` but due to lack of comments, couldn't understand their 
purpose
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578721945084300?thread_ts=1578504210.020500&cid=C3TPCAQG1
----
2020-01-11 09:15:42 UTC - Michele Sciabarra: @chetanm @Dave Grove @Rodric 
Rabbah  thanks for approving the openwhisk standalone docker image, howewer to 
merge looks like another approval is needed. I would like to ensure the 
standalone image is built, however I removed this change that publish the image 
<https://github.com/apache/openwhisk/pull/4782/commits/0e5e9cb855980bcbbc8e8af45ba4ebca0d3cf0f7>
 because it breaks the build -  should we create manually a docker image to be 
able to upload it?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578734142087900
----
2020-01-11 09:16:28 UTC - Michele Sciabarra: @chetanm do you have any idea how 
I could check for readyness of openwhisk before starting the playground ?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578734188088700
----
2020-01-11 11:00:41 UTC - giusdp: @Rodric Rabbah Hey, I was checking out the 
Invoker class and saw it has a `main` method, the args that are passed to it 
are used to build the InstanceId. I thought about adding a custom argument to 
the CLI line to launch an invoker, since that gets picked up by the `main` 
method and gets added to the InstanceId. Do you know what runs the `main`/where 
can I find the command line?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578740441088800?thread_ts=1578572550.021500&cid=C3TPCAQG1
----
2020-01-11 11:58:24 UTC - giusdp: Ok I found the ansible file that launches 
that line. Now the problem is to let my own deployment of openwhisk use that 
ansible file
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578743904089000?thread_ts=1578572550.021500&cid=C3TPCAQG1
----
2020-01-11 16:31:23 UTC - giusdp: Hello, how can I access the logs of the 
controller of openwhisk deployed on a kubernetes cluster?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578760283089700?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 16:43:43 UTC - Rodric Rabbah: I think you just use kubectl
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578761023090100?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 16:54:45 UTC - giusdp: You mean kubectl logs ow-controller-0? I 
tried that but it doesnt give me anything
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578761685090300?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:00:15 UTC - Ali Tariq: you also need to specify the namespace, 
`kubectl logs ow-controller-0 -n openwhisk`  if thats the namespace you created 
with helm
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578762015090500?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:04:15 UTC - giusdp: Yes I meant that I did that. It doesnt show 
any logs unfortunately. After a while it even gives a timeout error
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578762255090800?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:13:21 UTC - Ali Tariq: could you share the output of `kubectl get 
pods -n openwhisk` ?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578762801091000?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:22:39 UTC - giusdp: ```NAME                                       
               READY   STATUS      RESTARTS   AGE
ow-alarmprovider-ccc874444-c99wm                          1/1     Running     0 
         5m57s
ow-apigateway-6b9b9d974f-2dw9v                            1/1     Running     0 
         5m56s
ow-cloudantprovider-758d8bb68b-frdhb                      1/1     Running     0 
         5m57s
ow-controller-0                                           1/1     Running     0 
         5m56s
ow-couchdb-5cd8484499-vgbmm                               1/1     Running     0 
         5m57s
ow-gen-certs-hx965                                        0/1     Completed   0 
         5m56s
ow-init-couchdb-gszkq                                     0/1     Completed   0 
         5m56s
ow-install-packages-75zs9                                 0/1     Error       0 
         5m56s
ow-install-packages-vjccd                                 1/1     Running     0 
         24s
ow-invoker-0                                              1/1     Running     0 
         5m56s
ow-kafka-0                                                1/1     Running     0 
         5m56s
ow-kafkaprovider-69544b9fc9-sppr5                         1/1     Running     0 
         5m57s
ow-nginx-7ff4cb6ff9-mkbw4                                 1/1     Running     0 
         5m56s
ow-redis-bf84f7756-drnbb                                  1/1     Running     0 
         5m57s
ow-wskadmin                                               1/1     Running     0 
         5m57s
ow-zookeeper-0                                            1/1     Running     0 
         5m56s
wskow-invoker-00-1-prewarm-nodejs10                       1/1     Running     0 
         2m29s
wskow-invoker-00-2-prewarm-nodejs10                       1/1     Running     0 
         2m30s
wskow-invoker-00-3-whisksystem-invokerhealthtestaction0   1/1     Running     0 
         2m28s
wskow-invoker-00-4-prewarm-nodejs10                       1/1     Running     0 
         34s```
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578763359091200?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:22:57 UTC - giusdp: The install packages pod gave error and 
restarted, but i already can create and invoke actions
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578763377091400?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:24:54 UTC - Ali Tariq: does `ow-invoker-0` also not show any logs?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578763494091600?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:26:06 UTC - giusdp: No, just ran `kubectl logs ow-invoker-0 -n 
openwhisk` and again no output til the timeout
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578763566091800?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:27:41 UTC - Ali Tariq: that is really strange, normally that's 
the way you access the logs. It could have something to do with the failing 
pod, do you mind sharing the `mycluster.yaml` you used to create the deployment?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578763661092100?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:35:36 UTC - giusdp: ```whisk:
  ingress:
    type: NodePort
    apiHostName: ip
    apiHostPort: 31001

nginx:
  httpsNodePort: 31001

k8s:
  persistence:
    hasDefaultStorageClass: "false"
  explicitStorageClass: "openwhisk-nfs"

providers:
  alarm:
    enabled: false
  kafka:
    enabled: false
  cloudant:
    enabled: false

controller:
  imageName: "giusdp/controller"
  imageTag: "latest"```
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578764136092300?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:36:25 UTC - giusdp: This is the mycluster file, I was trying to 
use a modified controller, but I can't access any log :cry:
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578764185092500?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:37:45 UTC - Ali Tariq: did you first test accessing logs with the 
default controller image? might help narrow down the issue!
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578764265092800?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:40:48 UTC - giusdp: I'll try now
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578764448093000?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:51:41 UTC - giusdp: Unfortunately nothing again :sweat_smile:
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578765101093200?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:53:12 UTC - giusdp: I don't know if it could be a problem from 
the master node, for example it can't establish a connection to the worker node 
to retrieve the logs? But if this was the cause openwhisk shouldnt work at 
all...
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578765192093400?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 17:57:24 UTC - giusdp: Just to try I ran kubectl logs on the 
apiserver pod that is in the kube-system namespace on the master node, it worked
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578765444093600?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 18:02:29 UTC - Ali Tariq: Wow, that's a new for me!
```whisk:
  ingress:
    type: NodePort
    apiHostName: ip
    apiHostPort: 31001

invoker:
  containerFactory:
    impl: "kubernetes"

nginx:
  httpsNodePort: 31001

k8s:
  persistence:
    enabled: false```
i normally use this configuration, never really changed the `providers` 
defaults &amp; never had any issue! - i doubt that could be the issue.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578765749093800?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 18:05:29 UTC - Ali Tariq: could you try, 
`k8s:persistence:enabled:false` , maybe your persistent volume isn't letting 
pods write stuff down to storage.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578765929094000?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 18:12:55 UTC - giusdp: I just deployed openwhisk with your mycluster 
file but it gave me the same problem.
```Error from server: Get 
<https://192.168.1.140:10250/containerLogs/openwhisk/ow-controller-0/controller>:
 dial tcp 192.168.1.140:10250: i/o timeout```
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578766375094300?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 18:13:23 UTC - Ali Tariq: is the same pod still failing?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578766403094500?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 18:47:12 UTC - giusdp: Yes, no logs from any pod in the worker node 
are received. I wonder if it's a problem with the connection, like a port 
problem
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578768432094700?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 19:07:41 UTC - Ali Tariq: to me ... seems more like k8s cluster 
setup issue, maybe someone else could provide better insights.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578769661096200?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 19:12:53 UTC - Ali Tariq: @Rodric Rabbah apologies for disturbing 
you but could you provide any insights in this case. i am trying to push for a 
submission for which i ran some workload benchmarks on Openwhisk. Just want to 
make sure, the above mentioned behavior is not becuz of some mis-configuration!
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578769973096400?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:21:41 UTC - Rodric Rabbah: The scheduler tries to reuse 
containers, but favors autoscaling over reuse (because it doesnt track the hold 
time for a resource). How much time did you wait between the two 400-invoke 
batches? If they’re sufficiently close in time I would have expected greater 
reuse.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578770501096600?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:25:57 UTC - Ali Tariq: right away! (10-20 seconds)- but in the 
`actionA` &amp; `actionB`  case, why does it start destroying `actionA` when it 
still has space in `containerMemoryPool`
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578770757096800?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:26:26 UTC - giusdp: Thanks for the help anyway! I'll look more 
into it
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578770786097000?thread_ts=1578760283.089700&cid=C3TPCAQG1
----
2020-01-11 19:28:14 UTC - Rodric Rabbah: the invokers don’t over commit memory 
- so my explanation is that the max number of containers on an invoker as 
exceeded so the garbage collector reclaimed the oldest containers
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578770894097400?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:28:27 UTC - Rodric Rabbah: a container while paused doesnt 
consume cpu but it will consume memory
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578770907097600?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:28:54 UTC - Rodric Rabbah: cpu is overcommitted but not memory
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578770934097800?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:29:42 UTC - Rodric Rabbah: it’s also possible in the actionA 
batches that the 600 pods aren’t all for actionA? the stem cell pool is 
replenished as the stem cells are used - are these actions of the stem cell 
kind?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578770982098000?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:30:38 UTC - Rodric Rabbah: happy to read the paper draft to 
understand the methodology better then i may be able to offer more insights
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771038098200?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:30:42 UTC - Ali Tariq: i created simple `blackbox` kind actions 
... i don't know `stem cells`
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771042098400?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:30:55 UTC - Rodric Rabbah: so these are “docker” actions?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771055098700?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:31:00 UTC - Ali Tariq: yes
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771060098900?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:31:10 UTC - Rodric Rabbah: wow ok - did you pre-pull the images?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771070099100?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:31:53 UTC - Rodric Rabbah: stem cells are containers that can run 
actions of a specific kind (node, python, etc) and is a system optimization to 
avoid creating new containers for a new action, can save you 500ms or so by not 
running docker run
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771113099300?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:32:09 UTC - Ali Tariq: i didn't - i thought it pulls on the first 
invocation and tries to reuse the pull on all consecutive invocations
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771129099500?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:32:30 UTC - Rodric Rabbah: yes, per invoker
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771150099800?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:34:36 UTC - Ali Tariq: so, i should pre-pull on all invokers? 
that's 20! also, after i ran the workload once ... every invoker has the pulled 
images. I waited to some time so that idle containers could be deallocated (`10 
minutes`) - still the same behavior
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771276100000?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:36:22 UTC - Rodric Rabbah: if youre repeating the experiments on 
the same nodes, then the images should be there and so disregard the first run 
for example and all the images resident
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771382100200?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:36:30 UTC - Ali Tariq: ```my explanation is that the max number 
of containers on an invoker as exceeded so the garbage collector reclaimed the 
oldest containers```
on this point ... i thought actions are assigned on round-robin fashion - how 
could one invoker get exceeded?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771390100400?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:38:15 UTC - Ali Tariq: let me re-iterate, this only happens on 
kube-deploy ... docker-compose deployment showed absolute bin-filling - and 
looking at the code `ContainerPool.scala` its also written like an absolute 
bin-filling
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771495100600?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:38:21 UTC - Rodric Rabbah: actions are assigned to invokers based 
on indexing of the invoker pool using a co-prime hash algorithm (to reduce 
collisions/noisy neighbors), so they’re round robin from a sub sequence of all 
invokers

when the high water mark on each invoker is reached then requests are queued on 
a designated home invoker for that user
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771501100800?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:39:32 UTC - Rodric Rabbah: i’m not sure what kube does 
differently - i was assuming youre not using the kube container factory a
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771572101000?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:39:37 UTC - Ali Tariq: but then again `docker-compose` was only 
single invoker, so it might not be a fair comparison
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771577101200?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:40:20 UTC - Rodric Rabbah: did you remove the blackbox container 
segregation? by default blackbox are allocated 10% of the invoker pool only
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771620101400?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:40:43 UTC - Ali Tariq: no, im using `kubernetesContainerFactory` 
, although for the sake of completeness i did test with 
`dockerContainerFactory` as well
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771643101600?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:41:04 UTC - Ali Tariq: yes i did - changed to `100%`  for blackbox
+1 : Rodric Rabbah
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771664101800?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:42:03 UTC - Ali Tariq: when i say `docker-compose` its the 
`apache/openwhisk-devtools/docker-compose` repository
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771723102100?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:42:28 UTC - Rodric Rabbah: right that’s just one invoker iirc
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771748102400?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:42:39 UTC - Rodric Rabbah: so none of the scheduling heuristic 
kick in/apply
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771759102600?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:43:20 UTC - Ali Tariq: okay ... that helps!
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771800102800?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:43:35 UTC - Rodric Rabbah: where are you submitting?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771815103000?thread_ts=1578647722.045200&cid=C3TPCAQG1
----
2020-01-11 19:43:40 UTC - Ali Tariq: atc
+1 : Rodric Rabbah
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1578771820103200?thread_ts=1578647722.045200&cid=C3TPCAQG1
----

[slack-digest] [2020-01-11] #general

Reply via email to