Re: Ideas from ApacheCon

Jim Riggs Thu, 01 Jun 2017 10:30:07 -0700

> On 1 Jun 2017, at 07:55, Jim Jagielski <j...@jagunet.com> wrote:
> 
> I really like where this is going... I just have a few questions:
> 
>  1. The style, esp with array usage is different; eg APR_ARRAY_PUSH
>     and APR_ARRAY_IDX... any particular reason why?

Well, we don't seem to be entirely consistent with using elts and apr_array*() 
directly vs. the helper #defines across the codebase. I was doing everything 
with direct pointer access and arithmetic until I started looking at the 
heartbeat lbmethod that was using some of the helpers. IMNSHO, I think they are 
a little cleaner and easier use/read. I got lost in dereferencing hell with all 
of the `&((*...) **)->(*)'! :-) The helpers are there, and I think they are 
convenient and worth using, but I'm not digging through all of this code as 
regularly as some of the others on here. If the standard/consensus is the other 
way, it's easy to change.

>  2. I understand the logic behind creating the arrays, but doesn't
>     this increase the overhead. We go thru the full list of workers
>     one time, and then go thru the list of avail works and spares
>     right after that. Assume that all workers are available; doesn't
>     it mean we go thru that last 2x?

If everything is available, the old implementation only goes through the list 
once and picks a worker. This new implementation goes through the list once and 
returns a *subset* that the lbmethod then has to loop through to pick a worker. 
Worst case (e.g. X workers all available in lbset 0), yes, we would now do 2X.

In a more complex scenario (e.g. X workers all available in each lbsets 0, 1, 
and 2), we do 3X (1st loop) + X (2nd loop) = 4X, not 2 * 3X = 6X. So, yes, 
there is slightly more overhead...but, the second loop (in the lbmethod) has at 
most one lbset's worth of workers and really isn't doing anything except an 
arithmetic comparison to pick the best from them.

If a number of workers are unavailable, however, it's different. The old 
implementation loops the whole list twice per lbset if it doesn't find a 
worker, once for workers and once for standbys, whereas this new implementation 
still loops the whole list only once per lbset. In the example above, 2 * 3X = 
6X for old, 3X + X = 4X for new.

The only way I can think of to avoid this without going back to duplicating 
code across lbmethods, which I would argue is worse, is to maybe have the 
lbmethod provide a callback function and context pointer to 
ap_proxy_balancer_usable_workers() that gets called during the loop iteration 
to pick the best member in flight. I am open to that...it just seemed to add 
unnecessary complexity.

Regardless, even worst case, we are looking at what, iterating 6 pointers 
instead of 3 or 10 instead of 5? We probably have some lower hanging fruit 
across the request lifecycle code to increase performance than saving some 
arithmetic on a handful of structs, no? ;-)

>> On May 31, 2017, at 1:44 PM, Jim Riggs <apache-li...@riggs.me> wrote:
>> 
>>> On 23 May 2017, at 09:16, Jim Riggs <apache-li...@riggs.me> wrote:
>>> 
>>>> On 18 May 2017, at 13:22, Rainer Jung <rainer.j...@kippdata.de> wrote:
>>>> 
>>>> Am 18.05.2017 um 19:46 schrieb Jim Jagielski:
>>>>> Based on feedback from various sessions:
>>>>> 
>>>>> o A new-kind of "hot standby" in mod_proxy which kicks
>>>>> in whenever a worker moves out of the pool (ie, doesn't
>>>>> wait until all workers are out)... ala a redundant
>>>>> hard drive.
>>>> 
>>>> Maybe "spare worker" (and we could have more than one such spares).
>>> 
>>> Exactly. I am already working on some code for this, though it to seems to 
>>> necessarily be a bit convoluted in the current codebase.
>>> 
>>> The way we treat a "hot standby" in mod_proxy_balancer is as a last-ditch 
>>> effort to return something. I.e. *all* workers are unavailable, so then we 
>>> use the hot standby. (This problem can actually be solved a different way 
>>> by setting a high value for lbset.)
>>> 
>>> In my mind, though, what is proposed here is actually how I actually expect 
>>> a "hot standby" to work. Think of it more like a "hot spare" in disk RAID 
>>> terms. That is, if *any* worker is unavailable, the hot spare will be used 
>>> (or at least added to the list of potential workers...still to be 
>>> determined by the lbmethod implementation).
>>> 
>>> Example:
>>> 
>>> <Proxy "balancer://mycluster">
>>> BalancerMember "http://192.168.1.50:80";
>>> BalancerMember "http://192.168.1.51:80";
>>> BalancerMember "http://192.168.1.52:80";
>>> BalancerMember "http://192.168.1.53:80"; status=+H
>>> </Proxy>
>>> 
>>> In this case, .53 will only get used if .50, .51, and .52 are *all* 
>>> unavailable.
>>> 
>>> <Proxy "balancer://mycluster">
>>> BalancerMember "http://192.168.1.50:80";
>>> BalancerMember "http://192.168.1.51:80";
>>> BalancerMember "http://192.168.1.52:80";
>>> BalancerMember "http://192.168.1.53:80"; status=+R # new "hot spare" status
>>> BalancerMember "http://192.168.1.54:80"; status=+R # new "hot spare" status
>>> </Proxy>
>>> 
>>> In this case, if .50 becomes unavailable, .53 (or .54 depending on 
>>> implementation) will be treated as an available worker for the lbmethod to 
>>> potentially choose. If 2 or more of .50, .51, and .52 become unavailable, 
>>> both .53 and .54 would be available to be chosen.
>>> 
>>> So, instead of having a single fallback option when *all* workers are dead, 
>>> we will have a way of trying to ensure that a specific number of workers (3 
>>> in the example above) are always available...just like a hot spare drive 
>>> plugs into the RAID array when one of the members dies. In our case, 
>>> though, once the main worker recovers, the hot spare will go back to being 
>>> a hot spare (except for matching routes).
>>> 
>>> Comments welcome.
>> 
>> 
>> As promised, balancer hot spare support: 
>> https://bz.apache.org/bugzilla/show_bug.cgi?id=61140
>> 
>

Re: Ideas from ApacheCon

Reply via email to