get the impact hbase brings to HDFS, datanode log exploded after we started HBase.

steven zhuang Thu, 08 Apr 2010 02:52:27 -0700

hi, all,

        I have posted a mail before, but I think with a different title I
can get more help from you guys.
        Now in our 30 nodes cluster, we have built up a hbase cluster. we
have noticed that the datanode's log has grown much bigger after we added in
the hbase cluster.


        The datanode log size is growing from 20- MB to 7-800MB big.
        Larger log size means more data access and transferring, and we want
to lower this if we can.

        I know there is unavoidable read/write in HBase, but we want to know
how much does hbase contribute to the datanode's log Or why is the datanode
log now so big?

        At present, my idea is calculating the data IO quantity of both HDFS
and HBase for a given day, and with the result we can have a rough estimate
of the situation.
        One problem I met now is to decide from the regionserver log the
quantity of data been read/written by Hbase, should I count the lengths in
following log records as lengths of data been read/written?:

org.apache.hadoop.hbase.regionserver.Store: loaded
/user/ccenterq/hbase/hbt2table2/165204266/queries/1091785486701083780,
isReference=false,
sequence id=1526201715, length=*72426373*, majorCompaction=true
2010-03-04 01:11:54,262 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
Started memstore flush for region table_word_in_doc, resort
all-2010/01/01,1267629092479. Current region memstore size *40.5m*

        here I am not sure the *72426373/40.5m is the length (in byte) of
data read by HBase. *

        Am I heading the wrong direction or is there any better idea?  Any
help is appreciated.

get the impact hbase brings to HDFS, datanode log exploded after we started HBase.

Reply via email to