Slow mapreduce using Hbase , regardless on number of machines

yair even-zohar Wed, 09 Jul 2008 08:54:41 -0700

I am testing HBase 0.1.2 and am getting the following performance using 
RowCounter class (I had to modify the main() method of the original class 
because it contains some hardcoded  parameters :-)


Single regionserver  - counting 200,000 lines in 60 or 61 seconds
5 regieonservers - counting 200,000 lines in 55 or 58 seconds

Clearly, one expects better performance, so I assume I'm doing something wrong. 
By the way, I'm getting about the same performance when I'm iterating through a 
scanner without the mapreduce.

Here is my hadoop-site.xml

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://sb-centercluster01:9100</value>
  </property>
  <property>
    <name>mapred.job.tracker</name>
    <value>hdfs://sb-centercluster01:9101</value>
  </property>
  <property>
    <name>mapred.map.tasks</name>
    <value>13</value>
  </property>
  <property>
    <name>mapred.reduce.tasks</name>
    <value>5</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
  <property>
    <name>dfs.name.dir</name>
    <value>/home/hadoop/dfs16,/tmp/hadoop/dfs16</value>
  </property>
  <property>
    <name>dfs.data.dir</name>
    <value>/state/partition1/hadoop/dfs16</value>
  </property>
</configuration>

Increasing "io.bytes.per.checksum" and "io.file.buffer.size" didn't help. 
Neither decreasing "dfs.replication"

Here is my hbase-site.xml

<configuration>
<property>
    <name>hbase.master</name>
    <value>sb-centercluster01:60002</value>
    <description>The host and port that the HBase master runs at.
    </description>
  </property>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://sb-centercluster01:9100/hbase</value>
    <description>The directory shared by region servers.
    </description>
  </property>
  <property>
    <name>hbase.io.index.interval</name>
    <value>8</value>
  </property>
</configuration>


Any help will be appreciated.

Thanks
-Yair

Slow mapreduce using Hbase , regardless on number of machines

Reply via email to