Job scheduling

Ian Halperin Mon, 06 Jun 2011 16:39:55 -0700

Hi,

I might be misunderstanding how scheduling is supposed to work, or I might
have something misconfigured, but my Map/Reduce jobs don't seem to run where
my data is located.


I get a bunch of these messages:
INFO org.apache.hadoop.mapred.JobInProgress: tip:task_201106062049_0001_
m_000021 has split on node:/rack1/rack1node1.local
... indicating it has correctly found the source data at my node
/rack1/rack1node1 (the only copy of the data - for the purpose of this
experiment I have set dfs.replication = dfs.replication.min =
dfs.replication.max = 1 so I only have 1 replica).

However, it then goes on to run the JOB_SETUP, MAP, REDUCE, JOB_CLEANUP
tasks on abitrary tasktrackers, usually not where the data is located, so
the first thing they have to do is pull it over the network from another
node.

Did I miss something - or hopefully configure something wrong? :)

Ian

Job scheduling

Reply via email to