Hi Andy,
I think i figured it out.
We will have to set mapred.hosts and dfs.hosts property in hadoop-site.xml
as follows:
<property>
<name>dfs.hosts</name>
<value>filename1</value>
<description>Names a file that contains a list of hosts that are
permitted to connect to the namenode. The full pathname of the file
must be specified. If the value is empty, all hosts are
permitted.</description>
</property>
[where filename1 will contain list of instances to be considered for
storage]
<property>
<name>mapred.hosts</name>
<value>filename2</value>
<description>Names a file that contains the list of nodes that may
connect to the jobtracker. If the value is empty, all hosts are
permitted.</description>
</property>
[Where filename2 will contain list of instances which will carry out
computation tasks]
Correct me if i am wrong.
Thanks once again,
Rakhi.
On Wed, Apr 8, 2009 at 10:45 AM, Rakhi Khatwani <[email protected]>wrote:
> Hi Andy,
> Thanks for your suggesstion.
> But i was wondering how could we seperate HDFS storage from Mapred
> Computations. as mapred uses the same master/slave configuration as HDFS.
>
> did you mean using a set of instances as slaves and another set of
> instances as regionservers.??
>
> Thanks in Advance,
> Rakhi
>
>
> On Tue, Apr 7, 2009 at 11:06 PM, Andrew Purtell <[email protected]>wrote:
>
>>
>> Hi Rakhi,
>>
>> The "cannot obtain block" error is actually a HDFS problem. Most
>> likely this block was lost by HDFS during a period of excessive
>> load. Usually the first sign you are using insufficient
>> resources for your load is filesystem issues such as these. To
>> address the problems I recommend you do two things at once.
>>
>> 1) The minimum usable instance type for HBase (and Hadoop) is
>> large in my opinion. The basic rule of thumb for HBase and
>> Hadoop daemons is you must allocate 1GB of heap/RAM and one
>> CPU (or vcpu) thread for each daemon. You can search the
>> hbase-user@ archives for previous discussion on this topic.
>>
>> 2) Allocate more instances to spread the load on DFS.
>>
>> On EC2 I recommend running storage such as HDFS/HBase on one set
>> of instances and mapreduce computations on another set. Hadoop
>> and HBase daemons are sensitive to thread starvation problems.
>>
>> Hope this helps,
>>
>> - Andy
>>
>> > From: Rakhi Khatwani
>> > Subject: Region Servers going down frequently
>> > Date: Tuesday, April 7, 2009, 2:45 AM
>> > Hi,
>> > I have a 20 node cluster on ec2(small instance).... i
>> > have a set of tables which store huge amount of data (tried
>> > wid 10,000 rows... more to be added).... but during my map
>> > reduce jobs, some of the region servers shut
>> > down thereby causing data loss, stop in my program
>> > execution and infact one of my tables got damaged. when ever
>> > i scan the table, i get the could not obtain block error.
>>
>>
>>
>>
>>
>