[
https://issues.apache.org/jira/browse/MAPREDUCE-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164852#comment-13164852
]
Vinod Kumar Vavilapalli commented on MAPREDUCE-3512:
----------------------------------------------------
Do we have enough information to see if the write call itself is taking time or
the subsequent (h)flush ? A micro benchmark perhaps? If the writes are already
buffered by the DFSClient, then the blame goes to the sync call, in which case,
we can simply fix this by doing the sync every so often instead of doing it
always.
> Batch jobHistory disk flushes
> -----------------------------
>
> Key: MAPREDUCE-3512
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3512
> Project: Hadoop Map/Reduce
> Issue Type: Improvement
> Components: mr-am, mrv2
> Affects Versions: 0.23.0
> Reporter: Siddharth Seth
>
> The mr-am flushes each individual job history event to disk for AM recovery.
> The history even handler ends up with a significant backlog for tests like
> MAPREDUCE-3402.
> History events could be batched up based on num records / time /
> TaskFinishedEvents to reduce the number of DFS writes - with the potential
> drawback of having to rerun some tasks during AM recovery.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira