Re: HoD and locality of TaskTrackers to data (on DataNodes)

Hemanth Yamijala Sun, 23 Mar 2008 21:11:34 -0700

Jiaqi,

Hi,


I have a question about using HoD and the locality of the assigned
TaskTrackers to the data.

Suppose I have a long-running HDFS installation with
TaskTrackers/JobTracker nodes dynamically allocated by HoD, and I
uploaded my data to HDFS prior to running my job/allocating nodes
using "dfs -put". Then, I allocate some nodes and run my job on that
data using HoD. Would the nodes allocated by HoD take into account the
HDFS nodes on which my data resides (e.g. by looking at which
DataNodes hold blocks that belong to the current user)? If the nodes
are just arbitrarily allocated, doesn't that break Hadoop's design
principle of having processing take place near the data?

And if HoD doesn't currently take block location into account when
allocating nodes, are there future plans for that to be incorporated?

Excellent point ! HOD does not currently take this into account. We areworking on ways in which we can accomplish this using configurationoutside HOD (i.e. in Torque / some Hadoop features in 0.17 likeHADOOP-1985). I will update this list (and possibly also documentation)on how this can be setup, after we have some more concrete results.


Thanks
Hemanth

Re: HoD and locality of TaskTrackers to data (on DataNodes)

Reply via email to