hi, Jonathan, * * On Wed, Apr 7, 2010 at 6:15 AM, Jonathan Gray <jg...@facebook.com> wrote:
> Can you explain more about what information you are trying to find out? > > You had an existing HDFS and you want to measure the additional impact > adding HBase is? Is that in terms of reads/writes/iops or data size? > > * I just want to get the additional I/O data size after adding Hbase to Hadoop.* > If you have a steady-state set of metrics for HDFS w/o HBase, can you not > just monitor those metrics w/ HBase running and calculate the deltas? > > *those hbase apps are done by different people, so it's hard to track data IO quantity. * Also, to what end are you trying to figure this out? I'm very much > interested in what courses of actions you might take given the different > information you could find out about HBase's influence on your cluster. > > I want to convince my leader that a larger RAM for the regionserver will lower the IO rate, there should be less swapping, but I have to get the comparison result first. > JG > > > -----Original Message----- > > From: steven zhuang [mailto:steven.zhuang.1...@gmail.com] > > Sent: Tuesday, April 06, 2010 8:34 AM > > To: hbase-user@hadoop.apache.org > > Subject: how can I check the I/O influence HBase to HDFS > > > > hi, there, > > I have this problem of checking the influence HBase > > brought to > > HDFS. > > I have a Hadoop cluster which has 30+ data nodes, and a > > Hbase > > cluster based on it, with 18 regionservers residing on 18 datanodes. > > we have observed the HDFS IO has increased a lot if we do > > some > > importing or query ops on hbase tables, but we don't know how > > much would hbase impact the HDFS, so now I have to dig into this. > > my idea is as follows: > > > > 1. grep from regionservers logs the file information > > of > > hbase tables, which mainly should be store files' names and their > > sizes, sum > > the size up. > > 2. grep from datanodes' logs the HDFS_READ/HDFS_WRITE > > log, > > and calculate the whole IO bytes. > > 3. get the rate of HBase IO / HDFS IO. > > > > my concern is that if the above idea is right, is there > > anything missing or a better way to do this? > > > > And to make it more convinsible, I want to have the > > block > > info for each HTable's, not just those ones under each table's > > directory, > > but also those store files which was later removed by major compaction, > > since in datanode log, all I can see is block id, any pointer or hint > > is > > really appreciated. >