mdeuser commented on a change in pull request #3778: Add documentation to the 
loadbalancer.
URL: 
https://github.com/apache/incubator-openwhisk/pull/3778#discussion_r197457413
 
 

 ##########
 File path: 
core/controller/src/main/scala/whisk/core/loadBalancer/ShardingContainerPoolBalancer.scala
 ##########
 @@ -45,10 +45,77 @@ import scala.concurrent.{ExecutionContext, Future, Promise}
 import scala.util.{Failure, Success}
 
 /**
- * A loadbalancer that uses "horizontal" sharding to not collide with fellow 
loadbalancers.
+ * A loadbalancer that schedules workload based on a hashing-algorithm.
+ *
+ * ## Algorithm
+ *
+ * The loadbalancer schedules workload based on a hashing-algorithm. At first, 
for every namespace + action pair a hash
+ * is calculated and then bounded to the number of invokers available. The 
determined index is the so called "home
+ * invoker". This is the invoker where the following progression will 
**always** start. If this invoker is healthy and
+ * if there is capacity on that invoker, the request is scheduled to it. If 
one of these prerequisites is not true,
+ * the index is incremented by a step-size. The step-sizes available are the 
all coprime numbers smaller than the amount
+ * of invokers available (coprime, to minimize collisions while progressing 
through the invokers). The step-size is
+ * picked by the same hash calculated above, bounded to the number of 
step-sizes available. The home-invoker-index is
+ * now incremented by the step-size and the checks (healthy + capacity) are 
done on the invoker we land on now. This
+ * procedure is repeated until all invokers have been checked at which point 
the "overload" strategy will be employed,
+ * which is to choose a healthy invoker randomly.
+ *
+ * An example:
+ * - availableInvokers: 10 (all healthy)
+ * - stepSizes: 1, 3, 7 (note how 2 and 5 is not part of this because it's not 
coprime to 10)
+ * - hash: 13
+ * - homeInvoker: 13 % 10 = 3
+ * - stepSizeIndex: 13 % 3 = 1 => stepSize = 3
+ *
+ * Progression to check the invokers: 3, 6, 9, 2, 5, 8, 1, 4, 7, 0 --> done
+ *
+ * This heuristic is based on the assumption, that the chance to get a warm 
container is the best on the home invoker
+ * and degrades the more hops you make. The hashing makes sure that all 
loadbalancers in a cluster will always pick the
+ * same home invoker and do the same progression for a given action.
+ *
+ * Known caveats:
+ * - This assumption is not always true. For instance, two heavy workloads 
landing on the same invoker can override each
+ *   other, which results in many cold starts due to all containers being 
evicted by the invoker to make space for the
+ *   "other" workload respectively. Future work could be to keep a buffer of 
invokers last scheduled for each action and
+ *   to prefer to pick that one. Then the second-last one and so forth.
+ *
+ * ## Capacity checking
+ *
+ * Capacity is determined by what the loadbalancer thinks it scheduled to each 
invoker. Upon scheduling, an entry is
+ * made to update the books and a slot in a Semaphore is taken. That Semaphore 
is only released after the response from
+ * the invoker (active-ack) arrives **or** after the active-ack times out.
+ *
+ * Known caveats:
+ * - In an overload scenario, activations are queued directly to the invokers, 
which makes the active-ack timeout
+ *   unpredictable. Timing out active-acks in that case can cause the 
loadbalancer to prematurely assign new load to an
+ *   overloaded invoker, which can cause uneven queues.
+ * - The same is true if an invoker gets slow due to whatever reason (docker 
getting slow, machine overloaded). The
+ *   queue on this invoker will slowly rise if it gets slow to the point of 
still sending pings, but handling the load
+ *   so slowly, that the active-acks time out. The loadbalancer again will 
think there is capacity, when there is none.
+ *   Both caveats could be solved in future work by no queueing to invoker 
topics on overload, but to queue on a
+ *   centralized overflow topic. Timing out an active-ack can then be seen as 
a system-error, as described in the
+ *   following.
+ *
+ * ## Invoker health checking
+ *
+ * Invoker health is determined via a kafka-based protocol, where each invoker 
pings the loadbalancer every second. If
+ * no ping is seen for a defined amount of time, the invoker is considered 
"Offline".
+ * Moreover, results from all activations are inspected. If more than 3 out of 
the last 10 activations contained system
+ * errors, the invoker is considered "Unhealthy". If an invoker is unhealty, 
no user workload is sent to it, but
+ * test-actions are sent by the loadbalancer. If the 
system-error-threshold-count in the last 10 activations falls
+ * below 3, the invoker is considered "Healthy" again.
+ *
+ * ## Horizontal sharding
+ *
+ * Sharding is employed to avoid both loadbalancers to have to share any data, 
because the metrics used in scheduling
 
 Review comment:
   "to avoid both loadbalancers ~to have~ having to share any data."

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to