[ 
https://issues.apache.org/jira/browse/HELIX-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13732243#comment-13732243
 ] 

Kanak Biscuitwala commented on HELIX-196:
-----------------------------------------

This is pretty interesting, and solves many of the same problems that we try to 
solve in our auto-rebalancer, while taking into account physical node locations 
if we care about it. 

They've provided a few mapping algorithms for different techniques, all of 
which are based on the CRUSH algorithm that came out of UCSC, and some of which 
take into account a metric that they call RDF, which takes into account not 
only how many replicas there are, but how many nodes share a chunk of data.

It's still under development, and documentation is still a work in progress, 
but here's what I think are the key considerations from trying out the current 
release:

1. There may be some tricks we need to play to ensure that capacity constraints 
are satisfied. It looks like they're currently respected when possible, but 
ignored if not. In our case, it's more acceptable to not serve a replica rather 
than having a node promise it, and be overloaded. A way to do this is to 
advertise fewer replicas to the algorithm than the number that actually exists.
2. We'd need to decide the topology we want to expose to the algorithm. The 
simplest one is that every node is a direct child of the root.
3. What is the optimal RDF for our purposes?
                
> Research libcrunch, add it as a new rebalancing strategy
> --------------------------------------------------------
>
>                 Key: HELIX-196
>                 URL: https://issues.apache.org/jira/browse/HELIX-196
>             Project: Apache Helix
>          Issue Type: Improvement
>          Components: helix-core
>            Reporter: Kanak Biscuitwala
>            Assignee: Kanak Biscuitwala
>            Priority: Minor
>
> Twitter just open-sourced libcrunch, so it may be of interest to add it as a 
> new rebalancing strategy.
> Source: https://github.com/twitter/libcrunch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to