data locality in HDFS

Ian Holsman (Lists) Tue, 17 Jun 2008 23:19:40 -0700

hi.

I want to run a distributed cluster, where i have say 20 machines/slavesin 3 seperate data centers that belong to the same cluster.

Ideally I would like the other machines in the data center to be able toupload files (apache log files in this case) onto the local slaves andthen have map/red tasks do their magic without having to move data untilthe reduce phase where the amount of data will be smaller.


does Hadoop have this functionality?

how do people handle multi-datacenter logging with hadoop in this case?do you just copy the data into a centeral location?


regards
Ian

data locality in HDFS

Reply via email to