It'll depend on your access patterns but in general we'll be doing lots of small accesses... many more. A recently added clienttrace log, in this case the client referred to is dfsclient, will log messages like the following:
2010-04-07 22:15:52,078 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /10.20.20.189:50010, dest: /10.20.20.189:56736, bytes: 2022080, op: HDFS_READ, cliID: DFSClient_-994492608, srvID: DS-1740361948-10.20.20.189-50010-1270703663528, blockid: blk_2797215769808904384_1015 Lots of them, one per access. You could turn them off explicitly in your log4j. That should help. Don't run DEBUG level in datanode logs. Other answers inlined below. On Thu, Apr 8, 2010 at 2:51 AM, steven zhuang <steven.zhuang.1...@gmail.com> wrote: >... > At present, my idea is calculating the data IO quantity of both HDFS > and HBase for a given day, and with the result we can have a rough estimate > of the situation. Can you use the above noted clientrace logs to do this? Are clients on different hosts -- i.e. the hdfs clients and hbase clients? If so that'd make it easy enough. Otherwise, it'd be a little difficult. There is probably an easier way but one (awkward) means of calculating would be by writing a mapreduce job that took clienttrace messages and al blocks in the filesystem and then had it sort the clienttrace messages that belong to the ${HBASE_ROOTDIR} subdirectory. > One problem I met now is to decide from the regionserver log the > quantity of data been read/written by Hbase, should I count the lengths in > following log records as lengths of data been read/written?: > > org.apache.hadoop.hbase.regionserver.Store: loaded > /user/ccenterq/hbase/hbt2table2/165204266/queries/1091785486701083780, > isReference=false, > sequence id=1526201715, length=*72426373*, majorCompaction=true > 2010-03-04 01:11:54,262 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: > Started memstore flush for region table_word_in_doc, resort > all-2010/01/01,1267629092479. Current region memstore size *40.5m* > > here I am not sure the *72426373/40.5m is the length (in byte) of > data read by HBase. * Thats just file size. Above we opened a storefile and we just logged its size. We don't log how much we've read/written any where in hbase logs. St.Ack