[
https://issues.apache.org/jira/browse/HBASE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114633#comment-14114633
]
Mikhail Antonov commented on HBASE-11165:
-----------------------------------------
bq. Please pile on all with thoughts. We need to put stake in grounds soon for
hbase 2.0 cluster topology.
2 humble cents from my side:
- I thought that the primary requirement for splittable meta is not really
read-write throughput, but rather the thinking that on large enough cluster
with 1M+ regions, meta may not simply fit in master memory (or JVM would have
hard time keeping process consuming that much memory up)? That might be
worsened if we take any actions towards keeping more metadata in meta table.
[~enis] I believe you brought up before possible other things we may want to
add to meta table, which would inflate it in size? Couldn't find that
jira/thread. I would think that if we want to keep regions as a unit of
assignment/recovery, splittable meta is a the must (so far didn't see an
approach describing how to avoid it)
bq. ..and until we have HBASE-10295 "Refactor the replication implementation to
eliminate permanent zk node" and/or HBASE-11467 "New impl of Registry interface
not using ZK + new RPCs on master protocol" (Maybe a later phase of HBASE-10070
when followers can run closer in to the leader state would work here) or a new
master layout where we partition meta across multiple master server.
Unless I've missed some recent developments, HBASE-10070 is about region
replicas, while HBASE-11467 is about ZK-less client (the patch there is about
to grow big enough to provide for zk-less client, it's absorbing other subtasks
:) ). May be worth to reiterate that zk-less client is some sort of
pre-requisite or component of multi-master approach we're working on now, but
it would work fine with current single active-many backup-masters schema as
well.
I'm thinking that multi-masters and partitioned-masters (if we go in these
approaches) need to be discussed closely together and considering each other,
otherwise it'd be really hard to merge them together later on.
Also on this:
bq. A plus split meta has over colocated master and meta is that master
currently can be down for some period of time and the cluster keeps working; no
splits and no merges and if a machine crashes while master is down, data is
offline till master comes back (needs more exercise). This is less the case
when colocated master and meta.
I'd be curious to hear more opinions/assessments on how bad is that when master
is down, and what timeframe various people would consider as "generally ok",
"kind of long, really want it to be faster" and "unacceptably long"?
{quote}So far it seems to me the driving requirements are:
+ scale
+ high availability
+ stop using zookeeper completely/for persistence
{quote}
Yeah, I think that are exactly the points and they could be discussed together.
Besides scale, HA here probably consists of 2 parts - HA for region replicas
(read- and rw-), and improved HA for master. Improved master HA (multi-master)
for master is being researched/worked on now.
On "stop using ZK completely" there are general changes here coming along (like
see HBASE-7767, on stopping using ZK for keeping table state.. a patch from
[~octo47] is there ready for reviews), and proposed changes on client side to
make hbase client non-dependent on ZK (that's HBASE-11467 [~stack] mentioned
above, and that's what would be complementary to multi-master work).
> Scaling so cluster can host 1M regions and beyond (50M regions?)
> ----------------------------------------------------------------
>
> Key: HBASE-11165
> URL: https://issues.apache.org/jira/browse/HBASE-11165
> Project: HBase
> Issue Type: Brainstorming
> Reporter: stack
> Attachments: HBASE-11165.zip, Region Scalability test.pdf,
> zk_less_assignment_comparison_2.pdf
>
>
> This discussion issue comes out of "Co-locate Meta And Master HBASE-10569"
> and comments on the doc posted there.
> A user -- our Francis Liu -- needs to be able to scale a cluster to do 1M
> regions maybe even 50M later. This issue is about discussing how we will do
> that (or if not 50M on a cluster, how otherwise we can attain same end).
> More detail to follow.
--
This message was sent by Atlassian JIRA
(v6.2#6252)