[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853785#action_12853785
 ] 

Amar Kamat commented on MAPREDUCE-1533:
---------------------------------------

How about using _StringBuilder_ instead of _String.format_? The problem lies in 
the way how scheduling info is managed. As of now its a push model where every 
change in the scheduler's state results into an info string which gets  pushed 
to all the jobs. Shouldn't it be a pull model wherein the jobs pull the data 
from the scheduler whenever required? Roughly ~100 hearbeat calls are made in a 
sec and in every hearbeat, the scheduler's state can potentially change 
resulting into an info string being pushed. That is, most of the times the info 
gets over-written before getting consumed making the pull model a good fit for 
this case. But for now we can keep it simple and solve the problem at hand by 
using StringBuilder. Thoughts?

> reduce or remove usage of String.format() usage in 
> CapacityTaskScheduler.updateQSIObjects
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1533
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>            Assignee: Amar Kamat
>         Attachments: mapreduce-1533-v1.4.patch
>
>
> When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT 
> executes heartBeat() method heavily. This internally makes a call to 
> CapacityTaskScheduler.updateQSIObjects(). 
> CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() 
> for setting the job scheduling information. Based on the datastructure size 
> of "jobQueuesManager" and "queueInfoMap", the number of times String.format() 
> gets executed becomes very high. String.format() internally does pattern 
> matching which turns to be out very heavy (This was revealed while profiling 
> JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of 
> which String.format() took 46%.
> Would it be possible to do String.format() only at the time of invoking 
> JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while 
> processing heartbeats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to