It is quite possible to do this. It is also a bad idea.
One of the great things about map-reduce architectures is that data is near the computation so that you don't have to wait for the network. If you separate data and computation, you impose additional load on the cluster. What this will do to your throughput is an open question and it depends a lot on your programs. On 3/13/08 1:42 AM, "Andrey Pankov" <[EMAIL PROTECTED]> wrote: > Hi, > > Is it possible to configure hadoop cluster in such manner where there > are separately data-nodes and separately worker-nodes? I.e. when nodes > 1,2,3 store data in HDFS and nodes 3,4 and 5 do the map-reduce jobs and > take data from HDFS? > > If it's possible what impact will be on performance? Any suggestions? > > Thanks in advance, > > --- Andrey Pankov
