On Tue, Apr 7, 2009 at 11:29 PM, Rakhi Khatwani <[email protected]>wrote:
> Hi Amandeep, > I have 1GB Memory on each node on ec2 cluster(C1 Medium) . > i am using hadoop-0.19.0 and hbase-0.19.0 > well we were starting with 10,000 rows, but later it will go up to 100,000 > rows. 1GB is too low. You need around 4GB to get a stable system. > > > my map task basically reads an hbase table 'Table1', performs analysis on > each row, and dumps the analysis results into another hbase table 'Table2'. > each analysis task takes about 3-4 minutes when tested on local machine > (the > algorithm part.... w/o the map reduce). > > i have divided 'Table1' to 30 regions b4 sending it to the map. and set the > maximum number of map tasks to 20. Let hbase do the division into regions. Leave the table as it is in default state. > > i have set DataXceivers to 1024 and uLimit to 1024 yes.. increase these.. 2048 dataxceivers and 32k ulimit. > > i am able to process about 300 rows in an hour which i feel quite slow... > how do i increase the performance. the reaons are mentioned above. > > > meanwhile i will try settin the dataXceivers to 2048 and increasing the > file > limit as you mentioned. > > Thanks, > Rakhi > > On Wed, Apr 8, 2009 at 11:40 AM, Amandeep Khurana <[email protected]> > wrote: > > > 20 nodes is good enough to begins with. How much memory do you have on > each > > node? IMO, you should keep 1GB per daemon and 1GB for the MR job like > > Andrew > > suggested. > > You dont necessarily have to separate the datanodes and tasktrackers as > > long > > as you have enough resources. > > 10000 rows isnt big at all from hbase standpoint. What kind of > computation > > are you doing before dumping data into hbase? And what versions of Hadoop > > and Hbase are you running? > > > > There's another thing you should do. Increase the DataXceivers limit to > > 2048 > > (thats what I use). > > > > If you have root privelege over the cluster, then increase the file limit > > to > > 32k (see hbase faq for details). > > > > Try this out and see how it goes. > > > > > > Amandeep Khurana > > Computer Science Graduate Student > > University of California, Santa Cruz > > > > > > On Tue, Apr 7, 2009 at 2:45 AM, Rakhi Khatwani <[email protected] > > >wrote: > > > > > Hi, > > > I have a 20 node cluster on ec2(small instance).... i have a set > of > > > tables which store huge amount of data (tried wid 10,000 rows... more > to > > be > > > added).... but during my map reduce jobs, some of the region servers > shut > > > down thereby causing data loss, stop in my program execution and infact > > one > > > of my tables got damaged. when ever i scan the table, i get the could > not > > > obtain block error. > > > > > > 1. i want to make the cluster more robust. since it contains a lot of > > data. > > > and its really important that they remain stable. > > > 2. if one of my tables gets damaged (even after restarting dfs n > hbase), > > > how > > > do i go about recovering it? > > > > > > my ec2 cluster mostly has the default configuration. > > > with hadoop-site n hbase-site have some entries pertaining to > map-reduce > > > (for example. num of map tasks, mapred.task.timeout etc). > > > > > > Your help will be greatly appreciated. > > > Thanks, > > > Raakhi Khatwani > > > > > >
