[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853804#action_12853804
 ] 

Amar Kamat commented on MAPREDUCE-1533:
---------------------------------------

Benchmark results comparing StringBuilder with String.format :
1) StringBuilder took 1.261 secs for generating 1,000,000 strings 
2) String.format took 9.126 sec for generating 1,000,000 strings

So assuming that there are 400 heartbeat calls made per sec, we have ~2.5 ms 
per heartbeat time. Assuming that there are not more than 100 jobs running at a 
given time, we have 
1) StringBuilder taking 0.1261 ms for generating 100 strings 
2) String.format taking 0.9126 ms for generating 100 strings

Thus String.format takes 36% (i.e 0.9126/2.5) whereas StringBuilder takes 5% 
(i.e 0.1261/2.5) of the total heartbeat processing time. 

> reduce or remove usage of String.format() usage in 
> CapacityTaskScheduler.updateQSIObjects
> -----------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1533
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1533
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Rajesh Balamohan
>            Assignee: Amar Kamat
>         Attachments: mapreduce-1533-v1.4.patch
>
>
> When short jobs are executed in hadoop with OutOfBandHeardBeat=true, JT 
> executes heartBeat() method heavily. This internally makes a call to 
> CapacityTaskScheduler.updateQSIObjects(). 
> CapacityTaskScheduler.updateQSIObjects(), internally calls String.format() 
> for setting the job scheduling information. Based on the datastructure size 
> of "jobQueuesManager" and "queueInfoMap", the number of times String.format() 
> gets executed becomes very high. String.format() internally does pattern 
> matching which turns to be out very heavy (This was revealed while profiling 
> JT. Almost 57% of time was spent in CapacityScheduler.assignTasks(), out of 
> which String.format() took 46%.
> Would it be possible to do String.format() only at the time of invoking 
> JobInProgress.getSchedulingInfo?. This might reduce the pressure on JT while 
> processing heartbeats. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to