In our testing today, the round robin appears to spread out traffic just as
it ought to.  Thanks much for that.

In further testing we turned up a couple other things, however.

The first is that we are having problems balancing to multiple app servers
when using the hash or loadbalance algos.  I think the reason for is that
our hash info does exist on initial connection or for atomic requests.

For atomic requests (requests consisting of a preconstructed URL with a
bunch of variables passed to the app, followed by a single response
containing the requested data) , there isn't necessarily distinctive
hashable information.  In this case even the source address may not be
enough given that multiple computers within the client systems may be making
the requests behind a NATing device; in our testing, all connections from a
single host end up on a single appserver (the high ID'd one if using 'hash',
a lower number if using 'loadbalance'). This would be bad in a number of
different scenarios.

For non-atomic requests (by far the largest segment of traffic), we use
cookies or a GET variable to maintain and recognize state depending on
whether they're using a browser or an XML api.  On initial request from the
client computers, they have neither the GET variable nor the cookie.

In the case of the browser, it's delivered to them after their initial
connection (like most session cookies I think).  Concatenated to the end of
this sessionid is a small string we use in subsequent requests to route the
traffic back to the app server the first connection was made on.  If an app
server looks at this little bit of text and sees that it is a string for a
different host, it believes the connection has failed over from a different
host, sends a new cookie to the browser and restarts the sessions with a
new, blank slate.

In the case of the XML api, it's quite similar, except that the session data
is passed to the client in the initial response to their first query.  As
part of the API the take the session info and turn it into a GET variable in
subsequent requests.

That's a long complicated way of restating that according to the protocol
definition we have appear to have no useful data for hashing on initial
connection, and I think this is why we end up in the same bucket for every
request.  It would be nice if it fell back to roundrobin in the case that
nothing really matched for the hash additions, or at least if we could tell
it do so as a non-default (since loadbalance always has a little bit of hash
from the source address).

The other thing we discovered in testing is that in a protocol the query
entity doesn't appear to work with the hash action, at least as of a build
of hoststated from this morning.  We were trying to see if we could
workaround the hashing issue buy feeding it some client-specific data.  The
following fails horribly:

 request query hash "*" from "SS_LibHash"

Cheers,

;P mn


On 2007/11/26 11:36 AM, "Preston Norvell"
<[EMAIL PROTECTED]> muttered eloquently:

> Thanks much.  We're working on getting it compiled and tested.
> 
> Assuming testing goes well, our last major hurdle is the deterministic
> portion of the load balancing, which it sounds like you are thinking about
> already.
> 
> Thanks much again,
> 
> ;P mn
> 
> 
> On 2007/11/22 8:09 AM, "Reyk Floeter" <[EMAIL PROTECTED]> muttered
> eloquently:
> 
>> ok, forget about this diff - i committed the first part (roundrobin)
>> but skipped the loadbalance part because it is wrong to look at the
>> client port in this case (because i want to provide session
>> persistence).
>> 
>> On Thu, Nov 22, 2007 at 12:51:10PM +0100, Reyk Floeter wrote:
>>> - please try the attached diff, it will fix the roundrobin mode by
>>> saving the last index and traversing to the next available host.
>>> 
>>> (you can also have a look at my little test program to verify the alg:
>>> http://team.vantronix.net/~reyk/q.c)
>>> 
>>> - i'm also looking into improving the loadbalance mode. the attached
>>> diff includes the source port in loadbalance mode and the destination
>>> (relay) port in loadbalance and hash mode. make also sure that you
>>> feed in other variables if you want to get better results, for example
>>> 
>>> request hash "Host"
>>> 
>>> to feed the virtual hostname into the hash/loadbalance hash.
>>> 
>>> reyk
>>> 
>>> Index: hoststated.h
>>> ===================================================================
>>> RCS file: /cvs/src/usr.sbin/hoststated/hoststated.h,v
>>> retrieving revision 1.81
>>> diff -u -p -r1.81 hoststated.h
>>> --- hoststated.h 22 Nov 2007 10:09:53 -0000 1.81
>>> +++ hoststated.h 22 Nov 2007 11:45:00 -0000
>>> @@ -327,6 +327,7 @@ struct host {
>>> u_long    up_cnt;
>>> int    retry_cnt;
>>> struct ctl_tcp_event  cte;
>>> + int    idx;
>>>  };
>>>  TAILQ_HEAD(hostlist, host);
>>>  
>>> Index: relay.c
>>> ===================================================================
>>> RCS file: /cvs/src/usr.sbin/hoststated/relay.c,v
>>> retrieving revision 1.65
>>> diff -u -p -r1.65 relay.c
>>> --- relay.c 22 Nov 2007 10:09:53 -0000 1.65
>>> +++ relay.c 22 Nov 2007 11:45:01 -0000
>>> @@ -463,6 +463,7 @@ relay_init(void)
>>> if (rlay->dstnhosts >= RELAY_MAXHOSTS)
>>> fatal("relay_init: "
>>>    "too many hosts in table");
>>> +    host->idx = rlay->dstnhosts;
>>> rlay->dsthost[rlay->dstnhosts++] = host;
>>> }
>>> log_info("adding %d hosts from table %s%s",
>>> @@ -1876,10 +1877,14 @@ relay_hash_addr(struct sockaddr_storage
>>> sin4 = (struct sockaddr_in *)ss;
>>> p = hash32_buf(&sin4->sin_addr,
>>>    sizeof(struct in_addr), p);
>>> +  p = hash32_buf(&sin4->sin_port,
>>> +      sizeof(struct in_addr), p);
>>> } else {
>>> sin6 = (struct sockaddr_in6 *)ss;
>>> p = hash32_buf(&sin6->sin6_addr,
>>>    sizeof(struct in6_addr), p);
>>> +  p = hash32_buf(&sin6->sin6_port,
>>> +      sizeof(struct in6_addr), p);
>>> }
>>>  
>>> return (p);
>>> @@ -1903,7 +1908,7 @@ relay_from_table(struct session *con)
>>> case RELAY_DSTMODE_ROUNDROBIN:
>>> if ((int)rlay->dstkey >= rlay->dstnhosts)
>>> rlay->dstkey = 0;
>>> -  idx = (int)rlay->dstkey++;
>>> +  idx = (int)rlay->dstkey;
>>> break;
>>> case RELAY_DSTMODE_LOADBALANCE:
>>> p = relay_hash_addr(&con->in.ss, p);
>>> @@ -1933,6 +1938,8 @@ relay_from_table(struct session *con)
>>> fatalx("relay_from_table: no active hosts, desynchronized");
>>>  
>>>   found:
>>> + if (rlay->conf.dstmode == RELAY_DSTMODE_ROUNDROBIN)
>>> +  rlay->dstkey = host->idx + 1;
>>> con->retry = host->conf.retry;
>>> con->out.port = table->conf.port;
>>> bcopy(&host->conf.ss, &con->out.ss, sizeof(con->out.ss));
>>> 
> 
> --
> Preston M Norvell <[EMAIL PROTECTED]>
> Systems/Network Administrator
> Serials Solutions <http://www.serialssolutions.com>
> Phone:  (866) SERIALS (737-4257) ext 1094
> 

--
Preston M Norvell <[EMAIL PROTECTED]>
Systems/Network Administrator
Serials Solutions <http://www.serialssolutions.com>
Phone:  (866) SERIALS (737-4257) ext 1094

Reply via email to