2020-11-03 00:50:51 UTC - Rodric Rabbah: The max duration is configurable 
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604364651218400?thread_ts=1604283859.214600&cid=C3TPCAQG1
----
2020-11-03 19:46:41 UTC - Brendan Doyle: Do offline invokers get included in 
the scheduling algorithm for selecting home invokers? It looks like the state 
is just sent from the invoker pool and since offline invokers are included in 
`/invokers` api which calls a function in the load balancer so it seems like 
they are included in the total invokers for the hashing algorithm. I'm digging 
through the code and don't see anything to suggest otherwise.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604432801220900?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 19:47:24 UTC - Brendan Doyle: We have a few old invokers that no 
longer exist (though the kafka topics still exist) so I'm wondering if having 
around 20% of our invoker pool be offline is affecting our scheduling 
distribution.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604432844221000?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 19:53:29 UTC - Dominic Kim: I suppose no. IIRC, offline invokers are 
automatically generated by the max invoker ID.
For example, if you have two online invokers, invoker0 and invoker10, all 
invoker1~9 are automatically generated but offline.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604433209221300?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 19:53:58 UTC - Dominic Kim: And only online invokers are involved in 
the scheduling.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604433238221500?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 19:57:02 UTC - Brendan Doyle: yea so we have invokers0-33. 
invokers0-5 are offline. The load balancer is going to use an invoker pool size 
of 34 to determine the home invoker so the home invoker may hit 0-5. It will 
just spill over to the next available invoker if it does land on 0-5 based on 
the step size when actually scheduling the activation and that will effectively 
act as the home invoker, but I'm wondering if this is impacting our uniform 
distribution. Or is my reading of the code there incorrect?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604433422221700?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 20:19:23 UTC - Brendan Doyle: I think this is because we never bring 
our controller cluster down and only perform rolling restarts so it seems like 
that cluster state is shared between controllers so the offline invokers will 
never go away unless we re-bootstrap our cluster
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604434763222000?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 20:43:36 UTC - Rodric Rabbah: they should not factor into the 
scheduling (they’ll appear offline in the invoker map)
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604436216222200?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 20:43:52 UTC - Rodric Rabbah: you’re right once in the map the lb 
doesn’t forget them
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604436232222400?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 20:44:20 UTC - Rodric Rabbah: i suppose there could be a periodic 
purge to remove anything offline from the invoker map
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604436260222600?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 20:47:02 UTC - Brendan Doyle: could you sanity check me then? We 
call `/invokers` api. It returns 0-33 and says 0-5 are offline. `/invokers` 
calls `invokerHealth()` in the load balancer and `invokerHealth()` just returns 
`_invokers` which is taken from the cluster state. So that implies to me that 
the offline count is used towards the scheduling hashing algorithm since 
`updateInvokers()` in the cluster management just does a take on `_invokers`
+1 : Dominic Kim
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604436422222800?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:05:50 UTC - Rodric Rabbah: i concur - the step size is computed 
when the cluster size changes, and that does not exclude offline invokers
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604437550223100?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:06:30 UTC - Rodric Rabbah: arguably this is a bug in your case, 
it could lead to unnecessary collisions since the step size may include an 
increasing number of unusable invokers
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604437590223300?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:21:14 UTC - Brendan Doyle: it's not just step size right, the 
hash for home invokers could land on one of the offline invokers and then uses 
the steps to land on the next available invoker which then acts as the home 
invoker? I think that may have pretty big impact on the distribution and number 
of collisions
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604438474223500?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:21:52 UTC - Brendan Doyle: or am I misinterpreting that part
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604438512223700?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:22:29 UTC - Rodric Rabbah: you’re correct - the step size could 
lead to a bad pathology though, i think
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604438549223900?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:22:44 UTC - Rodric Rabbah: it’s not so much the home invoker 
that’s the issue
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604438564224100?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:23:15 UTC - Rodric Rabbah: it’s the sequence - the step size 
creates a sequence of invokers to check: 1, 5, 7, … so if you land on 1 and 
it’s unusable then it checks 5 then 7
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604438595224300?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:23:39 UTC - Rodric Rabbah: but those are all offline, you’re 
spending more time searching, and worse, the collision may increase
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604438619224500?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:25:31 UTC - Rodric Rabbah: the heuristic doesn’t expect the 
invoker to stay offline indefinitely

i can think of several ways to address this - like purging the list 
periodically, adding a ttl on unusable invokers, or adding an admin api to 
re-compute the sequence as some examples
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604438731224700?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:32:15 UTC - Brendan Doyle: yea I think we will take this on to 
fix asap. I like the periodic purge or ttl on unusable invokers. If it does 
programmatically get cleaned up and then gets brought back up it just should 
get readded no problem right?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439135224900?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:33:28 UTC - Rodric Rabbah: right - auto discovery is already 
handled
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439208225100?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:34:52 UTC - Rodric Rabbah: i think this is good to have - you 
could do it fairly easily (:sweat_smile: ) by adding a time stamp to each 
invoker when it goes offline and then check the time difference compared to 
“now” --- a question: would you do a ttl on all unusable invokers or just 
offline
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439292225300?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:34:55 UTC - Rodric Rabbah: prob should do both
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439295225500?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:36:35 UTC - Brendan Doyle: I'm trying to figure out how I can get 
more insight into how severely this may be impacting our uniform distribution 
because it is 20% of our invoker fleet that is considered `offline`. One 
question would be the kafka topic? That doesn't get removed right. When it gets 
readded again will it use the same invoker number to match to the same kafka 
topic because we don't want to end up creating infinite kafka topics
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439395225700?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:38:58 UTC - Rodric Rabbah: if the queue is empty, i wouldn’t 
worry about it, theres not much state associated with empty topics to become an 
issue

i dont recall if there’s an expiration on message in the queue though, so if 
it’s not empty those messages persist until expired
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439538225900?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:41:25 UTC - Dominic Kim: >  could you sanity check me then? We 
call `/invokers` api. It returns 0-33 and says 0-5 are offline. `/invokers` 
calls `invokerHealth()` in the load balancer and `invokerHealth()` just returns 
`_invokers` which is taken from the cluster state. So that implies to me that 
the offline count is used towards the scheduling hashing algorithm since 
`updateInvokers()` in the cluster management just does a take on `_invokers`
my bad.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439685226200?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:42:25 UTC - Dominic Kim: After reading codes, it seems it would 
also affect the number of managed/blackbox invokers as well.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439745226600?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:43:42 UTC - Brendan Doyle: ^ that is true. We don't use blackbox 
so that doesn't effect us but it would affect those fractions
white_check_mark : Dominic Kim
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439822226900?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:45:27 UTC - Rodric Rabbah: @Dominic Kim curious in your pull 
model/new scheduler what would happen? it’s a no-op right since it’s just pulls 
from invokers that are available
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604439927227200?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:46:57 UTC - Dominic Kim: In the pull model, the health status of 
invokers are managed by ETCD with Leases. Each invoker periodically keepalive 
the lease. If no keepalive is received for certain time, for example, 10s, then 
the health data is removed.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440017227500?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:47:16 UTC - Dominic Kim: Schedulers will only schedule container 
creation requests to healthy invokers.
+1 : Rodric Rabbah
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440036227700?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:47:56 UTC - Dominic Kim: Invokers are supposed to respond to the 
container creation request, and if no response is received for some time, 
schedulers retry sending messages to other invokers.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440076228000?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:49:16 UTC - Dominic Kim: Seems finally we can release the core 
1.0.0.
partyparrot : Rodric Rabbah
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440156228300?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:49:31 UTC - Dominic Kim: I would continue working on scheduler 
contribution.
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440171228500?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:51:22 UTC - Rodric Rabbah: once you do that we can break 
everything :smile:
i have lots of stuff to start adding
sassyparrot : Dominic Kim
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440282228800?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:52:01 UTC - Brendan Doyle: `i have lots of stuff to start adding` 
- like what :eyes:
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440321229100?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:52:40 UTC - Rodric Rabbah: :sweat_smile:
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440360229300?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:53:17 UTC - Rodric Rabbah: stateful function support, functions 
in isolates, support for Jamstack (serve static content)
+1 : Dominic Kim
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440397229500?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:53:48 UTC - Brendan Doyle: `stateful function support` - 
:exploding_head:
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440428229700?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:54:09 UTC - Brendan Doyle: whats functions in isolates?
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440449229900?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:59:19 UTC - Rodric Rabbah: isolates -> not containers
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440759230100?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:59:30 UTC - Rodric Rabbah: uses v8, similar to cloudflare workers
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440770230300?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:59:37 UTC - Rodric Rabbah: better compute density
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440777230500?thread_ts=1604432801.220900&cid=C3TPCAQG1
----
2020-11-03 21:59:48 UTC - Rodric Rabbah: this is work we prototyped with Adobe, 
a bit overdue to upstream
https://openwhisk-team.slack.com/archives/C3TPCAQG1/p1604440788230700?thread_ts=1604432801.220900&cid=C3TPCAQG1
----

Reply via email to