mdeuser commented on a change in pull request #3778: Add documentation to the 
loadbalancer.
URL: 
https://github.com/apache/incubator-openwhisk/pull/3778#discussion_r197782166
 
 

 ##########
 File path: 
core/controller/src/main/scala/whisk/core/loadBalancer/ShardingContainerPoolBalancer.scala
 ##########
 @@ -45,10 +45,85 @@ import scala.concurrent.{ExecutionContext, Future, Promise}
 import scala.util.{Failure, Success}
 
 /**
- * A loadbalancer that uses "horizontal" sharding to not collide with fellow 
loadbalancers.
+ * A loadbalancer that schedules workload based on a hashing-algorithm.
+ *
+ * ## Algorithm
+ *
+ * At first, for every namespace + action pair a hash is calculated and then 
an invoker is picked based on that hash
+ * (`hash % numInvokers`). The determined index is the so called 
"home-invoker". This is the invoker where the following
+ * progression will **always** start. If this invoker is healthy (see "Invoker 
health checking") and if there is
+ * capacity on that invoker (see "Capacity checking"), the request is 
scheduled to it.
+ *
+ * If one of these prerequisites is not true, the index is incremented by a 
step-size. The step-sizes available are the
+ * all coprime numbers smaller than the amount of invokers available (coprime, 
to minimize collisions while progressing
+ * through the invokers). The step-size is picked by the same hash calculated 
above (`hash & numStepSizes`). The
+ * home-invoker-index is now incremented by the step-size and the checks 
(healthy + capacity) are done on the invoker
+ * we land on now.
+ *
+ * This procedure is repeated until all invokers have been checked at which 
point the "overload" strategy will be
+ * employed, which is to choose a healthy invoker randomly. In a steadily 
running system, that overload means that there
+ * is no capacity on any invoker left to schedule the current request to.
+ *
+ * If no invokers are available or if there are no healthy invokers in the 
system, the loadbalancer will return an error
+ * stating that no invokers are available to take any work. Requests are not 
queued anywhere in this case.
+ *
+ * An example:
+ * - availableInvokers: 10 (all healthy)
+ * - hash: 13
+ * - homeInvoker: hash % availableInvokers = 13 % 10 = 3
+ * - stepSizes: 1, 3, 7 (note how 2 and 5 is not part of this because it's not 
coprime to 10)
+ * - stepSizeIndex: hash % numStepSizes = 13 % 3 = 1 => stepSize = 3
+ *
+ * Progression to check the invokers: 3, 6, 9, 2, 5, 8, 1, 4, 7, 0 --> done
+ *
+ * This heuristic is based on the assumption, that the chance to get a warm 
container is the best on the home invoker
+ * and degrades the more steps you make. The hashing makes sure that all 
loadbalancers in a cluster will always pick the
+ * same home invoker and do the same progression for a given action.
+ *
+ * Known caveats:
+ * - This assumption is not always true. For instance, two heavy workloads 
landing on the same invoker can override each
+ *   other, which results in many cold starts due to all containers being 
evicted by the invoker to make space for the
+ *   "other" workload respectively. Future work could be to keep a buffer of 
invokers last scheduled for each action and
+ *   to prefer to pick that one. Then the second-last one and so forth.
+ *
+ * ## Capacity checking
+ *
+ * Capacity is determined by what the loadbalancer thinks it scheduled to each 
invoker. Upon scheduling, an entry is
+ * made to update the books and a slot in a Semaphore is taken. That Semaphore 
is only released after the response from
+ * the invoker (active-ack) arrives **or** after the active-ack times out.
 
 Review comment:
   is there some sort of fixed capacity limit/value the loadbalancer uses to 
determine if the invoker has capacity to receive another request.  invoker has 
a max of 16 request slots; sharded max slots depends on number of controllers...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to