Re: Small issues regarding hadoop/hbase

stack Wed, 06 May 2009 08:12:55 -0700

On Wed, May 6, 2009 at 3:08 AM, Rakhi Khatwani <[email protected]>wrote:


>
> 1. i wanna scan a table, but the table is really huge. so i want the result
> of the scan to some file so that i can analyze it. how do we go about it???
>


If your table is really huge, I presume your scan output huge too.

You should write a MapReduce job.  See TableInputFormat in the hbase mapred
package.  See RowCounter in same package for an example pulling rows in a MR
job from hbase.



>
> 2. how do you dynamically add and remove nodes in the cluser without
> disturbing the existing map-reduce jobs (on EC2).



Are there datanodes and tasktrackers running on the same node as the
regionserver?  If so, it may be a little tricky coordinating them all.  You
could try shutting down the regionserver with a ./bin/hbase-daemon.sh
regionserver stop (IIRC) and then just let the server go.


>
>
> 3. suppose my dfs replication is 3. (which implies that i have 3 copies of
> data in some other location). how do i access other two copies. or how do i
> find out in which machines are the other copies kept?



A file is made of blocks in hdfs.  Its the blocks that are replicated.  To
see where the replicas are, you could enable debug logging.  You might be
able to see where all are in the hdfs UI, I'm not sure.


>
>
> 4. If my hbase master fails, is there any way i can set another node as my
> hbase master without disturbing my current map-reduce jobs??
>

Coming in 0.20.0 hbase.

St.Ack

Re: Small issues regarding hadoop/hbase

Reply via email to