kang_min82 wrote:
Hello Matei, Which Tasktracker did you mean here ?
I don't understand that. In general we have mane Tasktrackers and each of
them runs on one separate Datanode. Why doesn't the JobTracker talk directly
to the Namenode for a list of Datanodes and then performs the MapReduce
tasks there.


1. There's no requirement for a 1:1 mapping of task-trackers to datanodes. You could bring up TT's on any machine with spare CPU cycles on your network, talking to a long lived filesystem built from a few datanodes

2. There's no requirement for HDFS. You could have a cluster of MapReduce nodes talking to other filesystems. Locality of data helps, but is not needed.

3. Layering makes for cleaner code.

Reply via email to