New HBase tables start with one region. The default split point -- when 
existing region(s) are split into more regions -- is when the size of the 
backing store file for any column family of the table exceeds 256MB. Until the 
table splits, you are guaranteed that only one RegionServer will be serving the 
table. Furthermore, the TableMap utility class configures the number of map 
operations for a job equal to the number of regions for a table. Taking into 
account I/O considerations, this makes sense. 

One way to speed the process of splitting a table into multiple regions is to 
adjust the hbase.hregion.max.filesize configuration parameter downward. I would 
advise that this value should not be set smaller than the DFS blocksize. 

Even so, until you store a substantial amount of data into your test table(s), 
there is not much if any parallelism available and furthermore you incur the 
overhead of Hadoop job scheduling. 

Hope this helps,

   - Andy

--- On Wed, 7/9/08, Yair Even-Zohar <[EMAIL PROTECTED]> wrote:

> From: Yair Even-Zohar <[EMAIL PROTECTED]>
> Subject: RE: Slow mapreduce using Hbase , regardless on number of machines
> To: [email protected]
> Date: Wednesday, July 9, 2008, 9:30 AM
> How do I find the number of regions for an HTable? 
> In a quick lookup I did on the actual machines, it seems
> that all the
> machine had new data in them once I load the table.
> 
> Thanks
> -Yair
> 
> -----Original Message-----
> From: Bryan Duxbury [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, July 09, 2008 11:13 AM
> To: [email protected]
> Subject: Re: Slow mapreduce using Hbase , regardless on
> number of
> machines
> 
> How many regions are there in your table? If your 200k
> regions fits  
> inside a single region, adding more region servers
> isn't going to  
> make anything faster because only one server will be
> participating.
> 
> -Bryan
> 
> On Jul 9, 2008, at 7:36 AM, yair even-zohar wrote:
> 
> > I am testing HBase 0.1.2 and am getting the following
> performance  
> > using RowCounter class (I had to modify the main()
> method of the  
> > original class because it contains some hardcoded 
> parameters :-)
> >
> > Single regionserver  - counting 200,000 lines in 60 or
> 61 seconds
> > 5 regieonservers - counting 200,000 lines in 55 or 58
> seconds
> >
> > Clearly, one expects better performance, so I assume
> I'm doing  
> > something wrong. By the way, I'm getting about the
> same performance  
> > when I'm iterating through a scanner without the
> mapreduce.
> >
> > Here is my hadoop-site.xml
> >
> > <configuration>
> >   <property>
> >     <name>fs.default.name</name>
> >    
> <value>hdfs://sb-centercluster01:9100</value>
> >   </property>
> >   <property>
> >     <name>mapred.job.tracker</name>
> >    
> <value>hdfs://sb-centercluster01:9101</value>
> >   </property>
> >   <property>
> >     <name>mapred.map.tasks</name>
> >     <value>13</value>
> >   </property>
> >   <property>
> >     <name>mapred.reduce.tasks</name>
> >     <value>5</value>
> >   </property>
> >   <property>
> >     <name>dfs.replication</name>
> >     <value>3</value>
> >   </property>
> >   <property>
> >     <name>dfs.name.dir</name>
> >    
> <value>/home/hadoop/dfs16,/tmp/hadoop/dfs16</value>
> >   </property>
> >   <property>
> >     <name>dfs.data.dir</name>
> >    
> <value>/state/partition1/hadoop/dfs16</value>
> >   </property>
> > </configuration>
> >
> > Increasing "io.bytes.per.checksum" and
> "io.file.buffer.size" didn't  
> > help. Neither decreasing "dfs.replication"
> >
> > Here is my hbase-site.xml
> >
> > <configuration>
> > <property>
> >     <name>hbase.master</name>
> >    
> <value>sb-centercluster01:60002</value>
> >     <description>The host and port that the
> HBase master runs at.
> >     </description>
> >   </property>
> >   <property>
> >     <name>hbase.rootdir</name>
> >    
> <value>hdfs://sb-centercluster01:9100/hbase</value>
> >     <description>The directory shared by region
> servers.
> >     </description>
> >   </property>
> >   <property>
> >     <name>hbase.io.index.interval</name>
> >     <value>8</value>
> >   </property>
> > </configuration>
> >
> >
> > Any help will be appreciated.
> >
> > Thanks
> > -Yair
> >
> >
> >


      

Reply via email to