[ 
https://issues.apache.org/jira/browse/KUDU-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Henke updated KUDU-3127:
------------------------------
    Labels: roadmap-candidate  (was: )

> Have replica placement and rebalancing consider non-uniform hardware
> --------------------------------------------------------------------
>
>                 Key: KUDU-3127
>                 URL: https://issues.apache.org/jira/browse/KUDU-3127
>             Project: Kudu
>          Issue Type: Improvement
>          Components: CLI, master
>            Reporter: Andrew Wong
>            Priority: Major
>              Labels: roadmap-candidate
>
> We've seen multiple deployments suffer from the fact of life that data 
> centers don't always have uniform hardware. Often times, racks are comprised 
> of whatever hardware we can salvage from other projects. As such, Kudu's 
> assumption that all tablet servers should be treated equally (sans location 
> awareness) can be a bad one.
> There are a few pieces to making this better:
>  * Having Kudu determine the relative capacities of each tablet servers 
> (either automatically, or as input by an operator)
>  * Updating the replica placement policy to account for capacity across 
> tablet servers
>  * Bonus: have Kudu account for the current size used on each tablet server
> Some things that might be worth considering:
>  * It seems reasonable to assume that each data directory is independent of 
> one another, so we should be able to determine with relative ease the total 
> capacity of a server by aggregating the total capacities of its data 
> directories. This doesn't account for colocated WAL directories, but that 
> might be a fine limitation, since we expect WAL usage to vary wildly as 
> ingest workloads vary. The capacity could be heartbeated to masters 
> periodically, or fetched via RPC by rebalancer tooling.
>  * Updating the placement policy seems trickier, since there are a lot of 
> nice properties with using the PO2C algorithm (e.g. eventual fixing of skew), 
> and with assuming that all tablets have equal weight (e.g. it's harder to 
> fall into the trap of moving a replica, only to move it back). Some variant 
> of PO2C, but based on _available space_ instead of replica count might be 
> worth considering for initial placement and for defining balance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to