[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14579634#comment-14579634
 ] 

Ray Chiang commented on MAPREDUCE-6376:
---------------------------------------

A few comments:

1) It turns out that Avro parsing is anywhere from 70% to 90% of the .jhist 
processing time.  Some data points for the json .jhist file:

- 50k mappers
-- 20 seconds overall read time
-- 16.6 seconds Avro parsing/reading
- 404k mappers
-- 68 seconds
-- 49 seconds Avro parsing/reading
-- 751k mappers
-- 300 seconds
-- 280 seconds Avro parsing/reading

2) I couldn't get access to a machine to generate more than 50k mapper jobs, 
but my rough experiments would see about 4x to 5x speedup in Avro 
parsing/reading.  For the worst case improvement on 751k mappers, I would 
expect the 300 seconds of processing time to get down to about 90 seconds.  
There is room to shave down the processing time by a few seconds here and 
there, but that's probably better left to subsequent JIRAs.

3) The .jhist file output format is now a configuration option, with the 
default set to json.


> Fix long load times of .jhist file in JobHistoryServer
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-6376
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6376
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 2.7.0
>            Reporter: Ray Chiang
>            Assignee: Ray Chiang
>         Attachments: MAPREDUCE-6376.001.patch
>
>
> When you click on a Job link in the JHS Web UI, it loads the .jhist file.  
> For jobs which have a large number of tasks, the load time can break UI 
> responsiveness.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to