kang_min82 wrote:
Hello Matei,
Which Tasktracker did you mean here ?
I don't understand that. In general we have mane Tasktrackers and each of
them runs on one separate Datanode. Why doesn't the JobTracker talk directly
to the Namenode for a list of Datanodes and then performs the MapReduce
tasks there.
1. There's no requirement for a 1:1 mapping of task-trackers to
datanodes. You could bring up TT's on any machine with spare CPU cycles
on your network, talking to a long lived filesystem built from a few
datanodes
2. There's no requirement for HDFS. You could have a cluster of
MapReduce nodes talking to other filesystems. Locality of data helps,
but is not needed.
3. Layering makes for cleaner code.