[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

Hudson (JIRA) Fri, 01 Nov 2013 06:10:14 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13811230#comment-13811230
 ]


Hudson commented on MAPREDUCE-5601:
-----------------------------------

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1596 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1596/])
MAPREDUCE-5601. ShuffleHandler fadvises file regions as DONTNEED even when 
fetch fails (Sandy Ryza) (sandy: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1537855)
* /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/FadvisedFileRegion.java
* 
/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java


> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5601
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.2.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>             Fix For: 2.3.0
>
>         Attachments: MAPREDUCE-5601.patch, MAPREDUCE-5601.patch, 
> MAPREDUCE-5601.patch
>
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

Reply via email to