[ https://issues.apache.org/jira/browse/MAPREDUCE-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15276694#comment-15276694 ]
Eric Payne commented on MAPREDUCE-6678: --------------------------------------- Thanks, [~nroberts], for raising this issue and providing a patch. bq. Tested this patch on a 10-node cluster using terasort. Verified using strace that nodemanager is issuing correct WILLNEED without DONTNEED. I recognize that it's difficult to produce a unit test for the patch. Would it be possible for you to post a very brief justification of that? Otherwise, patch looks good to me. +1 > Allow ShuffleHandler readahead without drop-behind > -------------------------------------------------- > > Key: MAPREDUCE-6678 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6678 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: nodemanager > Affects Versions: 3.0.0, 2.7.2 > Reporter: Nathan Roberts > Assignee: Nathan Roberts > Attachments: YARN-4964.001.patch > > > Currently mapreduce.shuffle.manage.os.cache enables/disables both readahead > (POSIX_FADV_WILLNEED) and drop-behind (POSIX_FADV_DONTNEED) logic within the > ShuffleHandler. > It would be beneficial if these were separately configurable. > - Running without readahead can lead to significant seek storms caused by > large numbers of sendfiles() competing with one another. > - However, running with drop-behind can also lead to seek storms because > there are cases where the server can successfully write the shuffle bytes to > the network, BUT the client doesn't want the bytes right now (MergeManager > wants to WAIT is an example) so it ignores them and asks for them again a bit > later. This causes repeated reads of the same data from disk. > I'll attach a simple patch that enables/disables readahead based on > mapreduce.shuffle.readahead.bytes==0, leaving > mapreduce.shuffle.manage.os.cache controlling only the drop-behind. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org