Hi,
     I am attaching a screenshot of my regionservers list in hbase UI.

why are all the requests directed to only one region server (marked in red),
is it a feature or have i gone wrong with the configuration? is this one of
the reasons for poor performance of my map reduce tasks?

Thanks,
Raakhi



On Wed, Apr 8, 2009 at 11:59 AM, Rakhi Khatwani <[email protected]>wrote:

> Hi Amandeep,
>                   I have 1GB Memory on each node on ec2 cluster(C1 Medium)
> . i am using hadoop-0.19.0 and hbase-0.19.0
> well we were starting with 10,000 rows, but later it will go up to 100,000
> rows.
>
> my map task basically reads an hbase table 'Table1', performs analysis on
> each row, and dumps the analysis results into another hbase table 'Table2'.
> each analysis task takes about 3-4 minutes when tested on local machine
> (the algorithm part.... w/o the map reduce).
>
> i have divided 'Table1' to 30 regions b4 sending it to the map. and set the
> maximum number of map tasks to 20.
> i have set DataXceivers to 1024 and uLimit to 1024
> i am able to process about 300 rows in an hour which i feel quite slow...
> how do i increase the performance.
>
> meanwhile i will try settin the dataXceivers to 2048 and increasing the
> file limit as you mentioned.
>
> Thanks,
> Rakhi
>
>
> On Wed, Apr 8, 2009 at 11:40 AM, Amandeep Khurana <[email protected]>wrote:
>
>> 20 nodes is good enough to begins with. How much memory do you have on
>> each
>> node? IMO, you should keep 1GB per daemon and 1GB for the MR job like
>> Andrew
>> suggested.
>> You dont necessarily have to separate the datanodes and tasktrackers as
>> long
>> as you have enough resources.
>> 10000 rows isnt big at all from hbase standpoint. What kind of computation
>> are you doing before dumping data into hbase? And what versions of Hadoop
>> and Hbase are you running?
>>
>> There's another thing you should do. Increase the DataXceivers limit to
>> 2048
>> (thats what I use).
>>
>> If you have root privelege over the cluster, then increase the file limit
>> to
>> 32k (see hbase faq for details).
>>
>> Try this out and see how it goes.
>>
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>>
>>
>> On Tue, Apr 7, 2009 at 2:45 AM, Rakhi Khatwani <[email protected]
>> >wrote:
>>
>> > Hi,
>> >      I have a 20 node cluster on ec2(small instance).... i have a set of
>> > tables which store huge amount of data (tried wid 10,000 rows... more to
>> be
>> > added).... but during my map reduce jobs, some of the region servers
>> shut
>> > down thereby causing data loss, stop in my program execution and infact
>> one
>> > of my tables got damaged. when ever i scan the table, i get the could
>> not
>> > obtain block error.
>> >
>> > 1. i want to make the cluster more robust. since it contains a lot of
>> data.
>> > and its really important that they remain stable.
>> > 2. if one of my tables gets damaged (even after restarting dfs n hbase),
>> > how
>> > do i go about recovering it?
>> >
>> > my ec2 cluster mostly has the default configuration.
>> > with hadoop-site n hbase-site have some entries pertaining to map-reduce
>> > (for example. num of map tasks, mapred.task.timeout etc).
>> >
>> > Your help will be greatly appreciated.
>> > Thanks,
>> > Raakhi Khatwani
>> >
>>
>
>

Reply via email to