[
https://issues.apache.org/jira/browse/HBASE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999619#comment-13999619
]
Lars Hofhansl commented on HBASE-11165:
---------------------------------------
We'll run into other limitations before we hit META size issues I guess. Each
column family and each region has a memstore. With a (say) 30gb heap and 128mb
memstores, and 40% of heap used for the memstore you can only host 96 regions
per region server. We'd need 10k servers for 1m regions.
Even if we assume that on average the memstores are 50% filled we still need 5k
servers for 1m regions.
Now, maybe only a few regions are being written, in that case we need much less
heap for the memstores.
And maybe we can make the memstores smaller (64 or 32mb); we'd get lots flushes
and great write amplification.
We should also discuss why few, large regions are bad, and whether we can
decouple the unit of distribution (a region) from whatever unit we're trying to
operate on. Maybe a mapper per region is not good if regions can grows to 20gb
(assuming we can ideally read around 100mb/s, we'd need at least 3.5mins to
scan through 20gb).
> Scaling so cluster can host 1M regions and beyond (50M regions?)
> ----------------------------------------------------------------
>
> Key: HBASE-11165
> URL: https://issues.apache.org/jira/browse/HBASE-11165
> Project: HBase
> Issue Type: Brainstorming
> Reporter: stack
>
> This discussion issue comes out of "Co-locate Meta And Master HBASE-10569"
> and comments on the doc posted there.
> A user -- our Francis Liu -- needs to be able to scale a cluster to do 1M
> regions maybe even 50M later. This issue is about discussing how we will do
> that (or if not 50M on a cluster, how otherwise we can attain same end).
> More detail to follow.
--
This message was sent by Atlassian JIRA
(v6.2#6252)