RE: Slow mapreduce using Hbase , regardless on number of machines

Yair Even-Zohar Wed, 09 Jul 2008 11:30:59 -0700

The table consists of 2 column families, 1000 & 10,000 columns. The
average line contains 3 or 4 integers (IntWritables) spread over these
columns


-----Original Message-----
From: Bryan Duxbury [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, July 09, 2008 11:53 AM
To: [email protected]
Subject: Re: Slow mapreduce using Hbase , regardless on number of
machines

Well, how big is a single line?

On Jul 9, 2008, at 9:30 AM, Yair Even-Zohar wrote:

> How do I find the number of regions for an HTable?
> In a quick lookup I did on the actual machines, it seems that all the
> machine had new data in them once I load the table.
>
> Thanks
> -Yair
>
> -----Original Message-----
> From: Bryan Duxbury [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, July 09, 2008 11:13 AM
> To: [email protected]
> Subject: Re: Slow mapreduce using Hbase , regardless on number of
> machines
>
> How many regions are there in your table? If your 200k regions fits
> inside a single region, adding more region servers isn't going to
> make anything faster because only one server will be participating.
>
> -Bryan
>
> On Jul 9, 2008, at 7:36 AM, yair even-zohar wrote:
>
>> I am testing HBase 0.1.2 and am getting the following performance
>> using RowCounter class (I had to modify the main() method of the
>> original class because it contains some hardcoded  parameters :-)
>>
>> Single regionserver  - counting 200,000 lines in 60 or 61 seconds
>> 5 regieonservers - counting 200,000 lines in 55 or 58 seconds
>>
>> Clearly, one expects better performance, so I assume I'm doing
>> something wrong. By the way, I'm getting about the same performance
>> when I'm iterating through a scanner without the mapreduce.
>>
>> Here is my hadoop-site.xml
>>
>> <configuration>
>>   <property>
>>     <name>fs.default.name</name>
>>     <value>hdfs://sb-centercluster01:9100</value>
>>   </property>
>>   <property>
>>     <name>mapred.job.tracker</name>
>>     <value>hdfs://sb-centercluster01:9101</value>
>>   </property>
>>   <property>
>>     <name>mapred.map.tasks</name>
>>     <value>13</value>
>>   </property>
>>   <property>
>>     <name>mapred.reduce.tasks</name>
>>     <value>5</value>
>>   </property>
>>   <property>
>>     <name>dfs.replication</name>
>>     <value>3</value>
>>   </property>
>>   <property>
>>     <name>dfs.name.dir</name>
>>     <value>/home/hadoop/dfs16,/tmp/hadoop/dfs16</value>
>>   </property>
>>   <property>
>>     <name>dfs.data.dir</name>
>>     <value>/state/partition1/hadoop/dfs16</value>
>>   </property>
>> </configuration>
>>
>> Increasing "io.bytes.per.checksum" and "io.file.buffer.size" didn't
>> help. Neither decreasing "dfs.replication"
>>
>> Here is my hbase-site.xml
>>
>> <configuration>
>> <property>
>>     <name>hbase.master</name>
>>     <value>sb-centercluster01:60002</value>
>>     <description>The host and port that the HBase master runs at.
>>     </description>
>>   </property>
>>   <property>
>>     <name>hbase.rootdir</name>
>>     <value>hdfs://sb-centercluster01:9100/hbase</value>
>>     <description>The directory shared by region servers.
>>     </description>
>>   </property>
>>   <property>
>>     <name>hbase.io.index.interval</name>
>>     <value>8</value>
>>   </property>
>> </configuration>
>>
>>
>> Any help will be appreciated.
>>
>> Thanks
>> -Yair
>>
>>
>>
>

RE: Slow mapreduce using Hbase , regardless on number of machines

Reply via email to