Okay so I have one question in mind. Suppose I have a replication factor of 3 on my cluster of some N nodes, where N>3 and there is a data block B1 that exists on some 3 Data nodes --> DD1, DD2, DD3.
I want to run some Mapper function on this block.. My JT will communicate with NN, to know where can he find the block. My assumption is NN will give JT all the Data node information where the block resides, in this case - DD1, DD2,DD3. Am I right on this ? Now my question is how JT will come to know on which DD it should send its mapper code ? Suppose it chose DD1, and my tasktracker starts running on that machine. By some reasons, DD1 is taking more time than it should have taken time when it would be running on DD2. How hadoop understand and take these decisions ? Thanks, Praveenesh
