Xu Chen created MAPREDUCE-6193:
----------------------------------
Summary: Hadoop 2.x MapReduce Job Counter Data Local Maps Lower
than Hadoop 1.x
Key: MAPREDUCE-6193
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6193
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 2.4.0
Reporter: Xu Chen
Assignee: Xu Chen
I run the MapReduce job at 400+ node of cluster based on Hadoop 2.4.0 confined
scheduer is FairScheduer and I noticed Job Counter of Data-Local is much lower
than Hadoop 1.x
Such as these situations:
Hadoop 1.x Data-local 99% Rack-Local 1%
Hadoop 2.4.0 Data-Local 75% Rack-Local 25%
So I looked up the source code of Hadoop 2.4.0 MRAppMaster and
YARN-FairScheduer,there are some situations that may lead to this kind of
problem.
We know MRAppMaster builds the Map of
Priority->ResourceNamer->Capacity->RemoteRequest->NumContainer
Too many containers are assigned to MRAppMaster from FairScheduler
MRAppMaster addContainerReq() and assignContainer() have changed NumContainer
which will send RemoteRequest to FairScheduler, and the FairScheduler will
reset value of NumContainer by the MRAppMaster’s heartbeat, but FairScheduler
set NumContainer itself when handle NODE_UPDATE event , So if the heartbeat of
MRAppMaster’s NumContainer next time is bigger than FairScheduler’s
NumContainer,the extra container is redundant for MRAppMaster,and MRAppMaster
will assign this container to Rack-Local because no task is needed on this
container’s host now
Besides, when one task requires more than one host, it will also cause this
problem.
So the conclusion is the FairScheduler’s NumConainer is reset by MRAppMaster’s
heartbeat and handle NODE_UPDATE event , both of MRAppMaster’s and NODE_UPDATE
are async logic
————————————————————
I found properties of FairScheduler’s config there are
yarn.scheduler.fair.locality.threshold.node,
yarn.scheduler.fair.locality.threshold.rack
and I’m confused that FairScheuler assignContainer() should be invoked
app.addSchedulingOpportunity(priority) after NODE_LOCAL assigned logic . but
now is opposite ,
means the application have chance to assign a container is opportunity will
increment , and when the application missed node of NODE_LOCAL opportunity is
great than locality.threshold.node most time ,so those properties is useless
for me .
——————————————
And if AppMaster sends no RemoteRequest.ANY at the same priority request , the
Scheudler will get NPE ,and the ResourceManager will exit immediately
see this
public synchronized int getTotalRequiredResources(Priority) {
return getResourceRequest(priority,RMNode.ANY).getNumContainers();
}
————————
Anyone has ideas for those issues please comment.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)