[
https://issues.apache.org/jira/browse/HBASE-12233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168009#comment-14168009
]
stack commented on HBASE-12233:
-------------------------------
bq. We found a 5x win on meta versions.
5x in what? i/o? memory used? Ability to cache? Shrinkage in HDFS size
occupied by meta?
bq. We haven't tried meta region replicas.
When it lands, a while yet it seems, this will scale read i/o only. Not write
i/o. It won't do anything about the size of meta in the fs. We'll waste mem
caching the same stuff multiple times, once per replica. Also, IMO regards
stale reads of meta I'd suggest we not be cavalier. Methinks it will make for
a new class of problems.
bq. We haven't tried allowing a shared cache infront of meta.
At first blush, adding an external system to manage our cache seems way more
complicated than adding back an extra tier. I'd have to hear more.
bq. We haven't tried a more compact meta table representation
That'd work for the in-memory representation -- being able to represent bigger
clusters in-mem -- but it does nothing to address i/o carried by meta-carrying
server nor does it address size of meta in HDFS (write-amplification
continually rewriting big files). We'll also be burning CPU at a higher rate
going from compact representation to pb over and over again.
bq. We haven't tried picking smaller split keys.
Fair point. We should do this for sure. You think this would make difference
when hosting 1M regions or 50M? How much? 10/20%?
bq. We haven't tried stripe compactions on meta.
Stripe compactions works best when time-series writes. It makes the compaction
story worse when writes are evenly distributed. Meta could be of either type
in any particular deploy.
bq. We haven't gotten block encoding working for meta.
Doesn't address i/o, caching (we can cache encoded but currently we just dumbly
decompress each access -- would have to add smarts), and block encoding would
probably help some with size-on-disk but what, cut size by 50% max? Maybe. 3G
to 1.5G region when 1M regions at best? It'd still be too big?
bq. While split meta helps maybe 0.1% of HBase's users.
Factor in that we want to pivot and tell folks that small regions is the way to
go (compactions): i.e. 10x or 100x what they are carrying currently... so being
simplisitic your 0.1% becomes 1% or 10% . Also this fraction are actually our
most critical users, our 'fortune' 100 as it were, the reason for hbase, the
folks who use us because they need to scale.
bq. he NN is still a bottle neck on cluster size
Yeah. But that is another issue we should address separately (it so happens
that the folks proposing this patch may be able to help in this regard).
bq. So making every user and every cluster feel the pain of increased
complexity and mttr....
...lets discuss this. Its a problem I agree.
bq. ....so that we the HBase developers can perform a thought experiment .....
This is not a thought experiment as I understand it. The lads are up against a
scaling problem.
bq. I am for sure against bringing root back in for branch-1
For sure root will not be back for branch-1. That said, I do not preclude a
patch that community brothers and sisters might need that adds root back as a
bridge across versions as long as it behind a million safety valves and
switches so it does not disturb the mainline code paths in case they need a
bridge to branch-1 and beyond.
This is a tough one. Should we go to the mailing list with this? I can rehash
the doc I wrote up over on the 1M/50M JIRA or I can paraphrase bits of the
nice job [~mantonov] did summarizing this issue in his recent meetup talk.
Good on you [~eclark]
> Bring back root table
> ---------------------
>
> Key: HBASE-12233
> URL: https://issues.apache.org/jira/browse/HBASE-12233
> Project: HBase
> Issue Type: Sub-task
> Reporter: Virag Kothari
> Assignee: Virag Kothari
> Attachments: HBASE-12233.patch
>
>
> First step towards splitting meta.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)