Hey, When you start hbase in a fresh installation it will use local fs in /tmp. The hadoop filesystem libraries we use allow the use of at least 3 filesystems (local, hdfs, kfs). Right now you are seeing the single ZK process and the combined HBase master/regionserver process.
HBase needs the following things out of it's filesystem: - global view - every single regionserver & master MUST see every file from everyone at all times. 1 hour rsync won't cut it. - high bandwidth, once you get 3+ servers doing high IO (compaction, etc), you wont want to rely on a 1 disk NFS. In theory you can use something like NFS and common mount dir on all regionservers/masters. This won't scale of course. It should _in theory_ work... You can specify the rootdir with something like "file:///nfs_mount_path/hbase". Normally we'd say hdfs://namenode:port/hbase The hbase scripts don't boot up or control hadoop at all. You must provide a working hadoop, then hbase can use it. It may seem a little "annoying" to have a 2 step process, but the decoupled control makes our control scripts more generic and suitable for all. Good luck out there! -ryan On Fri, Nov 27, 2009 at 1:55 AM, Tux Racer <[email protected]> wrote: > Thanks Ryan for your answer. > yes I was mistaken, I also thought that the default install of hbase did a > one node install of HDFS; and it seems that wrong: > > a ps auwx|grep java > > show only two java processes; > > org.apache.hadoop.hbase.zookeeper.HQuorumPeer > and > org.apache.hadoop.hbase.master.HMaster > > In the default hbase distribution we have in > > hbase-default.xml > > <name>hbase.rootdir</name> > <value>file:///tmp/hbase-${user.name}/hbase</value> > > I thought that the dependancy of hbase on HDFS was much stronger. For the > hbase configuration point of view if the hbase.rootdir parameter the only > parameter that hooks hbase to HDFS? > Or does zookeeper also binds hbase to HDFS? > Is it true to say that hbase does play well with HDFS but that it does play > well with any POSIX compliant filesystem too? > For a small cluster, is that a good idea to *not* use HDFS as a storage for > the hbase data? > If I accept to loose one hour of hbase data, is it OK to make hbase.rootdir > point to local (ext3) file system on the node and then rsync each hour that > directory to another node? I guess that rsync is not ideal due to the file > structure used (will generate a lot of network traffic) > > Thanks in advance, > TR > > Ryan Rawson wrote: >> >> I think you might be mistaken a bit - HBase layers on top of, and uses >> hadoop. HBase uses HDFS for persistence, and thus the balancer config >> and the other things you point out belong in the hadoop config. >> >> 3 nodes is a little light for HDFS... With r=3, there is are no spares. >> >> >
