[
https://issues.apache.org/jira/browse/KUDU-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Henke updated KUDU-3127:
------------------------------
Labels: roadmap-candidate (was: )
> Have replica placement and rebalancing consider non-uniform hardware
> --------------------------------------------------------------------
>
> Key: KUDU-3127
> URL: https://issues.apache.org/jira/browse/KUDU-3127
> Project: Kudu
> Issue Type: Improvement
> Components: CLI, master
> Reporter: Andrew Wong
> Priority: Major
> Labels: roadmap-candidate
>
> We've seen multiple deployments suffer from the fact of life that data
> centers don't always have uniform hardware. Often times, racks are comprised
> of whatever hardware we can salvage from other projects. As such, Kudu's
> assumption that all tablet servers should be treated equally (sans location
> awareness) can be a bad one.
> There are a few pieces to making this better:
> * Having Kudu determine the relative capacities of each tablet servers
> (either automatically, or as input by an operator)
> * Updating the replica placement policy to account for capacity across
> tablet servers
> * Bonus: have Kudu account for the current size used on each tablet server
> Some things that might be worth considering:
> * It seems reasonable to assume that each data directory is independent of
> one another, so we should be able to determine with relative ease the total
> capacity of a server by aggregating the total capacities of its data
> directories. This doesn't account for colocated WAL directories, but that
> might be a fine limitation, since we expect WAL usage to vary wildly as
> ingest workloads vary. The capacity could be heartbeated to masters
> periodically, or fetched via RPC by rebalancer tooling.
> * Updating the placement policy seems trickier, since there are a lot of
> nice properties with using the PO2C algorithm (e.g. eventual fixing of skew),
> and with assuming that all tablets have equal weight (e.g. it's harder to
> fall into the trap of moving a replica, only to move it back). Some variant
> of PO2C, but based on _available space_ instead of replica count might be
> worth considering for initial placement and for defining balance.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)