Hi Bhaskar,
On Thu, Nov 14, 2013 at 10:10:44AM -0500, Bhaskar Maddala wrote:
> > Worse, when applying consistent hashing on that, only two servers
> > got the load for several seconds, then two other ones.
>
> This is interesting. I did not do a time series distribution in my testing
> and this is good to know. I will probably do this in the next couple of day
Yes this could be interesting. In my case the injector simply increased the
counter for each new request. So requests enter like this :
GET /?id=1 HTTP/1.1
GET /?id=2 HTTP/1.1
GET /?id=3 HTTP/1.1
GET /?id=4 HTTP/1.1
etc...
ID loops at 2^32 but that was never reached. That said, this workload is
perfectly valid and does probably exist at some places where visitors
could be assigned in waves to servers depending on their arrival order.
> > So in the end, I changed my mind regarding
> > the wt6 function and accepted to reintroduce it because in my tests
> > it performed better for such workloads in my tests since I can't
> > produce these nasty patterns with it.
>
> > pick the best hashing function for the job depending on what
> > is being hashed
>
> I agree, the choice of the function depends on the input more than
> once would like it to. Hence, my results were usually prefixed with,
> "on my dataset".
I absolutely agree with this and that's why I also changed my mind in
the end. I think we'll have to try to port SipHash, which is theorically
almost as good as a crypto algorithm. If it works well for everything,
we could even imagine switching to this one as the default one in the
future (eg: for 1.6).
> > I'm attaching the 3 resulting patches here. Please tell me if that's
> > OK for you, in which case I'm going to merge them.
>
> Please do.
OK.
> The big difference I see between what I did and here is the
> bit masking around function and algorithm.
I know and this is purely a detail. I was unhappy with these bits
that required a lot of manipulation and my experience tells me that
when something does not go smoothly, it introduces bugs in the long
term. So I tried to understand why we had this and found this reason.
> As I mentioned, I tried
> a few alternatives and settled on one that worked for me. Another
> criteria that I used was to have the least number of changes to
> add an additional hashing function.
Oh I know what it's like, don't worry! It was an experimental
feature which we adapted several times after tests and feedback.
What matters is that in the end we're both pleased with it.
> I will also apply the patches and do a few sanity tests on my end
> later today.
OK great.
> Sorry about the delay in mailing the test results csv. I did try that
> yesterday but the download as csv generated some fairly poor
> results and I need to clean those up as well. I will do that today.
Don't worry, there's no rush. I couldn't find enough contiguous time
either, working on bugs etc... Fortunately there are much more responsive
guys than me on this list :-)
Best regards,
Willy