mdeuser commented on a change in pull request #3778: Add documentation to the
loadbalancer.
URL:
https://github.com/apache/incubator-openwhisk/pull/3778#discussion_r197457413
##########
File path:
core/controller/src/main/scala/whisk/core/loadBalancer/ShardingContainerPoolBalancer.scala
##########
@@ -45,10 +45,77 @@ import scala.concurrent.{ExecutionContext, Future, Promise}
import scala.util.{Failure, Success}
/**
- * A loadbalancer that uses "horizontal" sharding to not collide with fellow
loadbalancers.
+ * A loadbalancer that schedules workload based on a hashing-algorithm.
+ *
+ * ## Algorithm
+ *
+ * The loadbalancer schedules workload based on a hashing-algorithm. At first,
for every namespace + action pair a hash
+ * is calculated and then bounded to the number of invokers available. The
determined index is the so called "home
+ * invoker". This is the invoker where the following progression will
**always** start. If this invoker is healthy and
+ * if there is capacity on that invoker, the request is scheduled to it. If
one of these prerequisites is not true,
+ * the index is incremented by a step-size. The step-sizes available are the
all coprime numbers smaller than the amount
+ * of invokers available (coprime, to minimize collisions while progressing
through the invokers). The step-size is
+ * picked by the same hash calculated above, bounded to the number of
step-sizes available. The home-invoker-index is
+ * now incremented by the step-size and the checks (healthy + capacity) are
done on the invoker we land on now. This
+ * procedure is repeated until all invokers have been checked at which point
the "overload" strategy will be employed,
+ * which is to choose a healthy invoker randomly.
+ *
+ * An example:
+ * - availableInvokers: 10 (all healthy)
+ * - stepSizes: 1, 3, 7 (note how 2 and 5 is not part of this because it's not
coprime to 10)
+ * - hash: 13
+ * - homeInvoker: 13 % 10 = 3
+ * - stepSizeIndex: 13 % 3 = 1 => stepSize = 3
+ *
+ * Progression to check the invokers: 3, 6, 9, 2, 5, 8, 1, 4, 7, 0 --> done
+ *
+ * This heuristic is based on the assumption, that the chance to get a warm
container is the best on the home invoker
+ * and degrades the more hops you make. The hashing makes sure that all
loadbalancers in a cluster will always pick the
+ * same home invoker and do the same progression for a given action.
+ *
+ * Known caveats:
+ * - This assumption is not always true. For instance, two heavy workloads
landing on the same invoker can override each
+ * other, which results in many cold starts due to all containers being
evicted by the invoker to make space for the
+ * "other" workload respectively. Future work could be to keep a buffer of
invokers last scheduled for each action and
+ * to prefer to pick that one. Then the second-last one and so forth.
+ *
+ * ## Capacity checking
+ *
+ * Capacity is determined by what the loadbalancer thinks it scheduled to each
invoker. Upon scheduling, an entry is
+ * made to update the books and a slot in a Semaphore is taken. That Semaphore
is only released after the response from
+ * the invoker (active-ack) arrives **or** after the active-ack times out.
+ *
+ * Known caveats:
+ * - In an overload scenario, activations are queued directly to the invokers,
which makes the active-ack timeout
+ * unpredictable. Timing out active-acks in that case can cause the
loadbalancer to prematurely assign new load to an
+ * overloaded invoker, which can cause uneven queues.
+ * - The same is true if an invoker gets slow due to whatever reason (docker
getting slow, machine overloaded). The
+ * queue on this invoker will slowly rise if it gets slow to the point of
still sending pings, but handling the load
+ * so slowly, that the active-acks time out. The loadbalancer again will
think there is capacity, when there is none.
+ * Both caveats could be solved in future work by no queueing to invoker
topics on overload, but to queue on a
+ * centralized overflow topic. Timing out an active-ack can then be seen as
a system-error, as described in the
+ * following.
+ *
+ * ## Invoker health checking
+ *
+ * Invoker health is determined via a kafka-based protocol, where each invoker
pings the loadbalancer every second. If
+ * no ping is seen for a defined amount of time, the invoker is considered
"Offline".
+ * Moreover, results from all activations are inspected. If more than 3 out of
the last 10 activations contained system
+ * errors, the invoker is considered "Unhealthy". If an invoker is unhealty,
no user workload is sent to it, but
+ * test-actions are sent by the loadbalancer. If the
system-error-threshold-count in the last 10 activations falls
+ * below 3, the invoker is considered "Healthy" again.
+ *
+ * ## Horizontal sharding
+ *
+ * Sharding is employed to avoid both loadbalancers to have to share any data,
because the metrics used in scheduling
Review comment:
"to avoid both loadbalancers ~to have~ having to share any data."
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services