Re: Consistent hashing alternative to sdbm

Bhaskar Maddala Mon, 28 Oct 2013 21:46:40 -0700

Hello,

  Can you please take a look at [1]? Make sure it is what you had in mind,
I read thru our conversation here again and I understood that the change we
wanted to implement allowed selection of the hash function in addition to
map-based/consistent and avalance.


   The change provide the ability to specify. <> indicates optional

  hash-type consistent <sdbm/djb2/wt6>
  hash-type map-based <sdbm/djb2/wt6>
  hash-type avalanche <sdbm/djb2/wt6>

  Not all of it is implemented, i am in the middle of testing, but wanted
any early feed back you might have before i spent a lot of time on it.

Thanks
-Bhaskar


[1] https://github.com/maddalab/haproxy/pull/1


On Fri, Oct 25, 2013 at 2:19 AM, Willy Tarreau <[email protected]> wrote:

> Hi Bhaskar,
>
> First, you did an amazing work, thanks for sharing your findings!
>
> On Thu, Oct 24, 2013 at 11:12:14PM -0400, Bhaskar Maddala wrote:
> > Hello,
> >
> >     This is a little long, please bear with me. I was able to run more
> > tests the results are at [1]. As I mentioned before I was not able to
> > reproduce the distribution we had seen when we implemented our patch in
> > Haproxy using hashcount.c yesterday, the only remaining difference was
> > actually having the requests be served via an instance of haproxy.
> >
> >    I broke down the tests into the following 3 categories
> >    (a) ~10K request
> >    (b) All unique host names obtained from 1MM requests that were grepped
> > from haproxy production logs, there were 48259 blogs
> >    (c) 1MM requests
> >
> >    Case (a) used the data set we had based our decision on, case (b) and
> > (c) used a data set I obtained at the beginning of this conversation.
> >
> >     I then ran these 3 scenarios via both haproxy and hashcount.c for
> DJB2
> > and SDBM. I was able to reproduce* the distribution we had based our
> > decision on when using haproxy. Interestingly the test harness
> > (hashcount.c) produces wildly different distribution when the results are
> > compared to an instance of Haproxy, more so for SDBM.
>
> I'm thinking about something. If you're running with consistent hashing,
> another element matters a lot : the average distance between all output
> values. Indeed, with consistent hashing, we're not dividing the hash by
> the number of servers, we're searching the point that's the closest to
> the hash and assigning the connection the server associated with this
> point. So if the hash produces clusters of points, there are more risks
> that some will get more traffic than others due to their proximity with
> server points. I think that the image below summarizes what I mean very
> well :
>
>         http://offthelip.org/?p=149
>
> Right now, each server appears 16*weight times on the consistent hash
> circle and these places are pseudo random and uniform (I spent a lot
> of time ensuring this in the past, so I'm pretty sure about this but
> I'd be happy if we can improve this). You can check in lb_chash.c how
> this is done in chash_init_server_tree().
>
> I don't know how to reliably modelize that in hashcount.c. I can't
> find the test code I used when developing the consistent hash, so if
> we want to run new experimentations, I guess we'll have to copy-paste
> existing code into a new file.
>
> I'm thinking about something : since we want the points to all be the
> farthest from each other, we could measure the hash efficiency for
> consistent hashing this way :
>
>   - print all hash points
>   - sort them by value
>   - replace each of them with the distance from the previous one
>   - compute the stddev on these values
>
> If you're interested in running a test using this, that would be great.
> You can restart from the hashcount you already have since you have already
> implemented the stddev in it.
>
> >    As a sanity check, I verified that the results were reproducible under
> > repeated executions.
> >
> >    My test haproxy configuration is available at [2]. With the goal to
> > eliminate as many variables as possible, I set up 17 back ends to all
> use a
> > single nginx server that served static files from the file system, nginx
> > config is also available at [2]
> >
> >    A request flow would look like this
> >    curl with custom header -> Haproxy on 80 -> hash on custom header  to
> > select backend -> (all backends point to the same nginx instance) ->
> > request forwarded to backend nginx -> nginx responds
> >
> >   Following/Similar [3] was executed from the machine housing haproxy &
> > nginx.
> >
> >   Finally a note on the results [1]. Each sheet corresponds to each of
> the
> > cases (a)/(b)/(c). On the first sheet you will see the std dev numbers
> > under the heading "executing via haproxy" that we based our decision on.
> > The raw results are also available at [2]
> >
> >   Let me know if you would like to see more numbers with a changed
> > methodology etc. or if you have an explanation for the difference is
> > distribution when using haproxy vs using hashcount.c after taking a look
> at
> > the config.
>
> Yes, please try to run a test modeling consistent hashing as explained
> above. If something is not clear, do not hesitate to ask.
>
> >   As next steps, unless I hear otherwise from you, I am going to do the
> > following, implement the patch that I had suggested when I started this
> > thread, in which I will include wt_hash6 (correct?). I think these result
> > show that there is a justification for it.
>
> OK. But please provide the patch for 1.5, we're not supposed to add new
> features to 1.4. It's very likely that the same patch will easily apply
> to both versions.
>
> > Once I have the patch building,
> > I will retest these for sanity checks. Following which I will try to
> > determine why there is the difference between the test harness and an
> > executing haproxy instance. My tests do not include avalanche because I
> am
> > testing only on 1.4.
>
> I feel like we're going to improve the current hash method (at least the
> consistent hash). It's been a long time since we've done some research on
> something like this, and I like it when we're offering new features based
> on experimentation and research!
>
> Best regards,
> Willy
>
>

Re: Consistent hashing alternative to sdbm

Reply via email to