The table consists of 2 column families, 1000 & 10,000 columns. The average line contains 3 or 4 integers (IntWritables) spread over these columns
-----Original Message----- From: Bryan Duxbury [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 09, 2008 11:53 AM To: [email protected] Subject: Re: Slow mapreduce using Hbase , regardless on number of machines Well, how big is a single line? On Jul 9, 2008, at 9:30 AM, Yair Even-Zohar wrote: > How do I find the number of regions for an HTable? > In a quick lookup I did on the actual machines, it seems that all the > machine had new data in them once I load the table. > > Thanks > -Yair > > -----Original Message----- > From: Bryan Duxbury [mailto:[EMAIL PROTECTED] > Sent: Wednesday, July 09, 2008 11:13 AM > To: [email protected] > Subject: Re: Slow mapreduce using Hbase , regardless on number of > machines > > How many regions are there in your table? If your 200k regions fits > inside a single region, adding more region servers isn't going to > make anything faster because only one server will be participating. > > -Bryan > > On Jul 9, 2008, at 7:36 AM, yair even-zohar wrote: > >> I am testing HBase 0.1.2 and am getting the following performance >> using RowCounter class (I had to modify the main() method of the >> original class because it contains some hardcoded parameters :-) >> >> Single regionserver - counting 200,000 lines in 60 or 61 seconds >> 5 regieonservers - counting 200,000 lines in 55 or 58 seconds >> >> Clearly, one expects better performance, so I assume I'm doing >> something wrong. By the way, I'm getting about the same performance >> when I'm iterating through a scanner without the mapreduce. >> >> Here is my hadoop-site.xml >> >> <configuration> >> <property> >> <name>fs.default.name</name> >> <value>hdfs://sb-centercluster01:9100</value> >> </property> >> <property> >> <name>mapred.job.tracker</name> >> <value>hdfs://sb-centercluster01:9101</value> >> </property> >> <property> >> <name>mapred.map.tasks</name> >> <value>13</value> >> </property> >> <property> >> <name>mapred.reduce.tasks</name> >> <value>5</value> >> </property> >> <property> >> <name>dfs.replication</name> >> <value>3</value> >> </property> >> <property> >> <name>dfs.name.dir</name> >> <value>/home/hadoop/dfs16,/tmp/hadoop/dfs16</value> >> </property> >> <property> >> <name>dfs.data.dir</name> >> <value>/state/partition1/hadoop/dfs16</value> >> </property> >> </configuration> >> >> Increasing "io.bytes.per.checksum" and "io.file.buffer.size" didn't >> help. Neither decreasing "dfs.replication" >> >> Here is my hbase-site.xml >> >> <configuration> >> <property> >> <name>hbase.master</name> >> <value>sb-centercluster01:60002</value> >> <description>The host and port that the HBase master runs at. >> </description> >> </property> >> <property> >> <name>hbase.rootdir</name> >> <value>hdfs://sb-centercluster01:9100/hbase</value> >> <description>The directory shared by region servers. >> </description> >> </property> >> <property> >> <name>hbase.io.index.interval</name> >> <value>8</value> >> </property> >> </configuration> >> >> >> Any help will be appreciated. >> >> Thanks >> -Yair >> >> >> >
