> On 18 May 2017, at 13:22, Rainer Jung <rainer.j...@kippdata.de> wrote:
> 
> Am 18.05.2017 um 19:46 schrieb Jim Jagielski:
>> Based on feedback from various sessions:
>> 
>> o A new-kind of "hot standby" in mod_proxy which kicks
>>   in whenever a worker moves out of the pool (ie, doesn't
>>   wait until all workers are out)... ala a redundant
>>   hard drive.
> 
> Maybe "spare worker" (and we could have more than one such spares).

Exactly. I am already working on some code for this, though it to seems to 
necessarily be a bit convoluted in the current codebase.

The way we treat a "hot standby" in mod_proxy_balancer is as a last-ditch 
effort to return something. I.e. *all* workers are unavailable, so then we use 
the hot standby. (This problem can actually be solved a different way by 
setting a high value for lbset.)

In my mind, though, what is proposed here is actually how I actually expect a 
"hot standby" to work. Think of it more like a "hot spare" in disk RAID terms. 
That is, if *any* worker is unavailable, the hot spare will be used (or at 
least added to the list of potential workers...still to be determined by the 
lbmethod implementation).

Example:

<Proxy "balancer://mycluster">
 BalancerMember "http://192.168.1.50:80";
 BalancerMember "http://192.168.1.51:80";
 BalancerMember "http://192.168.1.52:80";
 BalancerMember "http://192.168.1.53:80"; status=+H
</Proxy>

In this case, .53 will only get used if .50, .51, and .52 are *all* unavailable.

<Proxy "balancer://mycluster">
 BalancerMember "http://192.168.1.50:80";
 BalancerMember "http://192.168.1.51:80";
 BalancerMember "http://192.168.1.52:80";
 BalancerMember "http://192.168.1.53:80"; status=+R # new "hot spare" status
 BalancerMember "http://192.168.1.54:80"; status=+R # new "hot spare" status
</Proxy>

In this case, if .50 becomes unavailable, .53 (or .54 depending on 
implementation) will be treated as an available worker for the lbmethod to 
potentially choose. If 2 or more of .50, .51, and .52 become unavailable, both 
.53 and .54 would be available to be chosen.

So, instead of having a single fallback option when *all* workers are dead, we 
will have a way of trying to ensure that a specific number of workers (3 in the 
example above) are always available...just like a hot spare drive plugs into 
the RAID array when one of the members dies. In our case, though, once the main 
worker recovers, the hot spare will go back to being a hot spare (except for 
matching routes).

Comments welcome.

Reply via email to