Thanks Ryan for your answer.
yes I was mistaken, I also thought that the default install of hbase did
a one node install of HDFS; and it seems that wrong:
a ps auwx|grep java
show only two java processes;
org.apache.hadoop.hbase.zookeeper.HQuorumPeer
and
org.apache.hadoop.hbase.master.HMaster
In the default hbase distribution we have in
hbase-default.xml
<name>hbase.rootdir</name>
<value>file:///tmp/hbase-${user.name}/hbase</value>
I thought that the dependancy of hbase on HDFS was much stronger. For
the hbase configuration point of view if the hbase.rootdir parameter the
only parameter that hooks hbase to HDFS?
Or does zookeeper also binds hbase to HDFS?
Is it true to say that hbase does play well with HDFS but that it does
play well with any POSIX compliant filesystem too?
For a small cluster, is that a good idea to *not* use HDFS as a storage
for the hbase data?
If I accept to loose one hour of hbase data, is it OK to make
hbase.rootdir point to local (ext3) file system on the node and then
rsync each hour that directory to another node? I guess that rsync is
not ideal due to the file structure used (will generate a lot of network
traffic)
Thanks in advance,
TR
Ryan Rawson wrote:
I think you might be mistaken a bit - HBase layers on top of, and uses
hadoop. HBase uses HDFS for persistence, and thus the balancer config
and the other things you point out belong in the hadoop config.
3 nodes is a little light for HDFS... With r=3, there is are no spares.