hi, all, I have posted a mail before, but I think with a different title I can get more help from you guys. Now in our 30 nodes cluster, we have built up a hbase cluster. we have noticed that the datanode's log has grown much bigger after we added in the hbase cluster.
The datanode log size is growing from 20- MB to 7-800MB big. Larger log size means more data access and transferring, and we want to lower this if we can. I know there is unavoidable read/write in HBase, but we want to know how much does hbase contribute to the datanode's log Or why is the datanode log now so big? At present, my idea is calculating the data IO quantity of both HDFS and HBase for a given day, and with the result we can have a rough estimate of the situation. One problem I met now is to decide from the regionserver log the quantity of data been read/written by Hbase, should I count the lengths in following log records as lengths of data been read/written?: org.apache.hadoop.hbase.regionserver.Store: loaded /user/ccenterq/hbase/hbt2table2/165204266/queries/1091785486701083780, isReference=false, sequence id=1526201715, length=*72426373*, majorCompaction=true 2010-03-04 01:11:54,262 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for region table_word_in_doc, resort all-2010/01/01,1267629092479. Current region memstore size *40.5m* here I am not sure the *72426373/40.5m is the length (in byte) of data read by HBase. * Am I heading the wrong direction or is there any better idea? Any help is appreciated.