Re: Separate data-nodes from worker-nodes

Ted Dunning Thu, 13 Mar 2008 08:25:57 -0700

It is very possible (even easy).

The data nodes run the datanode process.  The task nodes run the task
tracker.  If the data nodes don't have a task tracker running, then they
won't do any computation.



On 3/13/08 8:22 AM, "Andrey Pankov" <[EMAIL PROTECTED]> wrote:

> Thanks, Ted!
> 
> I also thought it is not good one to separate them out. Just was
> wondering is it possible at all. Thanks!
> 
> 
> Ted Dunning wrote:
>> It is quite possible to do this.
>> 
>> It is also a bad idea.
>> 
>> One of the great things about map-reduce architectures is that data is near
>> the computation so that you don't have to wait for the network.  If you
>> separate data and computation, you impose additional load on the cluster.
>> 
>> What this will do to your throughput is an open question and it depends a
>> lot on your programs.
>> 
>> 
>> On 3/13/08 1:42 AM, "Andrey Pankov" <[EMAIL PROTECTED]> wrote:
>> 
>>> Hi,
>>> 
>>> Is it possible to configure hadoop cluster in such manner where there
>>> are separately data-nodes and separately worker-nodes? I.e. when nodes
>>> 1,2,3 store data in HDFS and nodes 3,4 and 5 do the map-reduce jobs and
>>> take data from HDFS?
>>> 
>>> If it's possible what impact will be on performance? Any suggestions?
>>> 
>>> Thanks in advance,
>>> 
>>> --- Andrey Pankov
>> 
>> 
> 
> ---
> Andrey Pankov

Re: Separate data-nodes from worker-nodes

Reply via email to