In our testing today, the round robin appears to spread out traffic just as it ought to. Thanks much for that.
In further testing we turned up a couple other things, however. The first is that we are having problems balancing to multiple app servers when using the hash or loadbalance algos. I think the reason for is that our hash info does exist on initial connection or for atomic requests. For atomic requests (requests consisting of a preconstructed URL with a bunch of variables passed to the app, followed by a single response containing the requested data) , there isn't necessarily distinctive hashable information. In this case even the source address may not be enough given that multiple computers within the client systems may be making the requests behind a NATing device; in our testing, all connections from a single host end up on a single appserver (the high ID'd one if using 'hash', a lower number if using 'loadbalance'). This would be bad in a number of different scenarios. For non-atomic requests (by far the largest segment of traffic), we use cookies or a GET variable to maintain and recognize state depending on whether they're using a browser or an XML api. On initial request from the client computers, they have neither the GET variable nor the cookie. In the case of the browser, it's delivered to them after their initial connection (like most session cookies I think). Concatenated to the end of this sessionid is a small string we use in subsequent requests to route the traffic back to the app server the first connection was made on. If an app server looks at this little bit of text and sees that it is a string for a different host, it believes the connection has failed over from a different host, sends a new cookie to the browser and restarts the sessions with a new, blank slate. In the case of the XML api, it's quite similar, except that the session data is passed to the client in the initial response to their first query. As part of the API the take the session info and turn it into a GET variable in subsequent requests. That's a long complicated way of restating that according to the protocol definition we have appear to have no useful data for hashing on initial connection, and I think this is why we end up in the same bucket for every request. It would be nice if it fell back to roundrobin in the case that nothing really matched for the hash additions, or at least if we could tell it do so as a non-default (since loadbalance always has a little bit of hash from the source address). The other thing we discovered in testing is that in a protocol the query entity doesn't appear to work with the hash action, at least as of a build of hoststated from this morning. We were trying to see if we could workaround the hashing issue buy feeding it some client-specific data. The following fails horribly: request query hash "*" from "SS_LibHash" Cheers, ;P mn On 2007/11/26 11:36 AM, "Preston Norvell" <[EMAIL PROTECTED]> muttered eloquently: > Thanks much. We're working on getting it compiled and tested. > > Assuming testing goes well, our last major hurdle is the deterministic > portion of the load balancing, which it sounds like you are thinking about > already. > > Thanks much again, > > ;P mn > > > On 2007/11/22 8:09 AM, "Reyk Floeter" <[EMAIL PROTECTED]> muttered > eloquently: > >> ok, forget about this diff - i committed the first part (roundrobin) >> but skipped the loadbalance part because it is wrong to look at the >> client port in this case (because i want to provide session >> persistence). >> >> On Thu, Nov 22, 2007 at 12:51:10PM +0100, Reyk Floeter wrote: >>> - please try the attached diff, it will fix the roundrobin mode by >>> saving the last index and traversing to the next available host. >>> >>> (you can also have a look at my little test program to verify the alg: >>> http://team.vantronix.net/~reyk/q.c) >>> >>> - i'm also looking into improving the loadbalance mode. the attached >>> diff includes the source port in loadbalance mode and the destination >>> (relay) port in loadbalance and hash mode. make also sure that you >>> feed in other variables if you want to get better results, for example >>> >>> request hash "Host" >>> >>> to feed the virtual hostname into the hash/loadbalance hash. >>> >>> reyk >>> >>> Index: hoststated.h >>> =================================================================== >>> RCS file: /cvs/src/usr.sbin/hoststated/hoststated.h,v >>> retrieving revision 1.81 >>> diff -u -p -r1.81 hoststated.h >>> --- hoststated.h 22 Nov 2007 10:09:53 -0000 1.81 >>> +++ hoststated.h 22 Nov 2007 11:45:00 -0000 >>> @@ -327,6 +327,7 @@ struct host { >>> u_long up_cnt; >>> int retry_cnt; >>> struct ctl_tcp_event cte; >>> + int idx; >>> }; >>> TAILQ_HEAD(hostlist, host); >>> >>> Index: relay.c >>> =================================================================== >>> RCS file: /cvs/src/usr.sbin/hoststated/relay.c,v >>> retrieving revision 1.65 >>> diff -u -p -r1.65 relay.c >>> --- relay.c 22 Nov 2007 10:09:53 -0000 1.65 >>> +++ relay.c 22 Nov 2007 11:45:01 -0000 >>> @@ -463,6 +463,7 @@ relay_init(void) >>> if (rlay->dstnhosts >= RELAY_MAXHOSTS) >>> fatal("relay_init: " >>> "too many hosts in table"); >>> + host->idx = rlay->dstnhosts; >>> rlay->dsthost[rlay->dstnhosts++] = host; >>> } >>> log_info("adding %d hosts from table %s%s", >>> @@ -1876,10 +1877,14 @@ relay_hash_addr(struct sockaddr_storage >>> sin4 = (struct sockaddr_in *)ss; >>> p = hash32_buf(&sin4->sin_addr, >>> sizeof(struct in_addr), p); >>> + p = hash32_buf(&sin4->sin_port, >>> + sizeof(struct in_addr), p); >>> } else { >>> sin6 = (struct sockaddr_in6 *)ss; >>> p = hash32_buf(&sin6->sin6_addr, >>> sizeof(struct in6_addr), p); >>> + p = hash32_buf(&sin6->sin6_port, >>> + sizeof(struct in6_addr), p); >>> } >>> >>> return (p); >>> @@ -1903,7 +1908,7 @@ relay_from_table(struct session *con) >>> case RELAY_DSTMODE_ROUNDROBIN: >>> if ((int)rlay->dstkey >= rlay->dstnhosts) >>> rlay->dstkey = 0; >>> - idx = (int)rlay->dstkey++; >>> + idx = (int)rlay->dstkey; >>> break; >>> case RELAY_DSTMODE_LOADBALANCE: >>> p = relay_hash_addr(&con->in.ss, p); >>> @@ -1933,6 +1938,8 @@ relay_from_table(struct session *con) >>> fatalx("relay_from_table: no active hosts, desynchronized"); >>> >>> found: >>> + if (rlay->conf.dstmode == RELAY_DSTMODE_ROUNDROBIN) >>> + rlay->dstkey = host->idx + 1; >>> con->retry = host->conf.retry; >>> con->out.port = table->conf.port; >>> bcopy(&host->conf.ss, &con->out.ss, sizeof(con->out.ss)); >>> > > -- > Preston M Norvell <[EMAIL PROTECTED]> > Systems/Network Administrator > Serials Solutions <http://www.serialssolutions.com> > Phone: (866) SERIALS (737-4257) ext 1094 > -- Preston M Norvell <[EMAIL PROTECTED]> Systems/Network Administrator Serials Solutions <http://www.serialssolutions.com> Phone: (866) SERIALS (737-4257) ext 1094

