Xu Chen created MAPREDUCE-6193:
----------------------------------

             Summary: Hadoop 2.x MapReduce Job Counter Data Local Maps Lower 
than Hadoop 1.x
                 Key: MAPREDUCE-6193
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6193
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 2.4.0
            Reporter: Xu Chen
            Assignee: Xu Chen


I run the MapReduce job at 400+ node of cluster based on Hadoop 2.4.0 confined 
scheduer is FairScheduer and I noticed Job Counter of Data-Local is much lower 
than Hadoop 1.x 

Such as these situations:
Hadoop 1.x Data-local 99% Rack-Local 1%
Hadoop 2.4.0 Data-Local 75% Rack-Local 25%

So I looked up the source code of Hadoop 2.4.0 MRAppMaster and 
YARN-FairScheduer,there are some situations that may lead to this kind of 
problem.

We know MRAppMaster builds the Map of 
Priority->ResourceNamer->Capacity->RemoteRequest->NumContainer

Too many containers are assigned to MRAppMaster from FairScheduler 

MRAppMaster addContainerReq() and assignContainer() have changed NumContainer 
which will send RemoteRequest to FairScheduler, and the FairScheduler will 
reset value of NumContainer by the MRAppMaster’s heartbeat, but FairScheduler 
set NumContainer itself when handle NODE_UPDATE event ,  So if the heartbeat of 
MRAppMaster’s NumContainer next time is bigger than FairScheduler’s 
NumContainer,the extra container is redundant for MRAppMaster,and MRAppMaster 
will assign this container to Rack-Local because no task is needed on this 
container’s host now

Besides, when one task requires more than one host, it will also cause this 
problem.

So the conclusion is the FairScheduler’s NumConainer is reset by MRAppMaster’s 
heartbeat and handle NODE_UPDATE event , both of MRAppMaster’s and NODE_UPDATE 
are async logic 

————————————————————
I found properties of FairScheduler’s config there are 
yarn.scheduler.fair.locality.threshold.node,
yarn.scheduler.fair.locality.threshold.rack

and I’m confused that FairScheuler assignContainer() should be invoked  
app.addSchedulingOpportunity(priority)  after NODE_LOCAL assigned logic . but 
now is opposite ,
means the application have chance to assign a container is opportunity will 
increment , and when the application missed node of NODE_LOCAL opportunity is 
great than locality.threshold.node most time ,so those properties is useless 
for me .
——————————————
And if AppMaster sends no RemoteRequest.ANY at the same priority request , the 
Scheudler will get NPE ,and the ResourceManager will exit immediately 

see this

public synchronized int getTotalRequiredResources(Priority) {
        return getResourceRequest(priority,RMNode.ANY).getNumContainers();
}
————————
Anyone has ideas for those issues please comment.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to