[jira] [Commented] (HBASE-8480) Embed HDFS into HBase

Lars George (JIRA) Thu, 02 May 2013 23:32:23 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648220#comment-13648220
 ]


Lars George commented on HBASE-8480:
------------------------------------

[~stack] Yes, that makes sense, we need to define the various steps. I mean, I 
think for development it is good to have the local and pseudo modes as before. 
We would just add another choice, i.e. spinning up HDFS as well inside the 
nodes. Now, going from one to two and then three servers is the tricky part. By 
default, if you unpack a tarball and spin up HBase, I would stay in local mode 
as usual. Same for the pseudo distributed mode, i.e. the user will need to 
configure this as needed herself.

But then if wanted we could have a little CLI config wizard, that preps the new 
fully autonomous mode of HBase, where everything is controlled by it, but has 
all the cluster components. Once you start this, you would either have one 
server that runs a Master+NameNode process plus a RegionServer+DataNode 
process. Or if we had the above Master-less option, we can run one single 
process with all inside - these are just semantics I'd say. Though I personally 
would love to see the latter eventually to make HBase a single process system 
to boot with.

Then as you add nodes, you will have to simply join a new RS+DN process on 
another node and the config should be amended - but also could stay the same 
and the dfs.replication factor set to "automatic", which scales it from 1 to 
"default" (or a maximum we can specify). In other words, when you run two 
nodes, then you are setting the replication to 2, with three nodes to 3, with 
four nodes leave it at 3 and so on.

That should also include a flag that says when ramping up or down the 
replication factor that it should apply this to all the files in HDFS. That way 
you can increase and decrease the nodes as needed.

As for splitting out the HBM+NN to a separate machine, that is really meaning 
to shut down the RS+DN process or internal thread on that machine. Adding a SNN 
or HA NN is then a little beyond the automated scope. Well, the SNN we will 
need, so that should be handled since clusters are meant to be up forever/a 
long time. But reconfiguring to non-automatic mode is then done using the CLI 
wizard or editing the configs, followed by a rolling restart.

bq. I like the idea of bundling the master and regionserver in one binary 
better; no more special master treatment... any one can be msster and or a 
regionserver?

I do believe that is only useful in automatic mode, because on a larger cluster 
with dedicated roles, you already have 2-3 master machines running NNs, ZKs and 
so on. Then adding Master's there is trivial (since it is all automatically 
deployed most of the time by admins). That, methinks, would not really have a 
tangible advantage. But yes, when things are small, this is certainly something 
that we should have. See above.

                
> Embed HDFS into HBase
> ---------------------
>
>                 Key: HBASE-8480
>                 URL: https://issues.apache.org/jira/browse/HBASE-8480
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Lars George
>
> HBase is often a bit more involved to get going. We already have the option 
> to host ZooKeeper for very small clusters. We should have the same for HDFS. 
> The idea is that it adjusts replication based on the number of nodes, i.e. 
> from 1 to 3 (the default), so that you could start with a single node and 
> grow the cluster from there. Once the cluster reaches a certain size, and the 
> admin decides to split the components, we should have a why to export the 
> proper configs/settings so that you can easily start up an external HDFS 
> and/or ZooKeeper, while updating the HBase config as well to point to the new 
> "locations".
> The goal is to start a fully operational HBase that can grow from single 
> machine to multi machine clusters with just a single daemon on each machine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-8480) Embed HDFS into HBase

Reply via email to