Hi, I have a question about using HoD and the locality of the assigned TaskTrackers to the data.
Suppose I have a long-running HDFS installation with TaskTrackers/JobTracker nodes dynamically allocated by HoD, and I uploaded my data to HDFS prior to running my job/allocating nodes using "dfs -put". Then, I allocate some nodes and run my job on that data using HoD. Would the nodes allocated by HoD take into account the HDFS nodes on which my data resides (e.g. by looking at which DataNodes hold blocks that belong to the current user)? If the nodes are just arbitrarily allocated, doesn't that break Hadoop's design principle of having processing take place near the data? And if HoD doesn't currently take block location into account when allocating nodes, are there future plans for that to be incorporated? Thanks, Jiaqi Tan
