[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

Sandy Ryza (JIRA) Wed, 30 Oct 2013 00:15:48 -0700

     [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sandy Ryza updated MAPREDUCE-5601:
----------------------------------

    Description: 
When a reducer initiates a fetch request, it does not know whether it will be 
able to fit the fetched data in memory.  The first part of the response tells 
how much data will be coming.  If space is not currently available, the reduce 
will abandon its request and try again later.  When this occurs, the 
ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
next time it's asked for, it will definitely be read from disk, even if it 
happened to be in the page cache before the request.

I noticed this when trying to figure out why my job was doing so much more disk 
IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found that disk 
reads went to nearly 0 on machines that had enough memory to fit map outputs 
into the page cache.  I then straced the NodeManager and noticed that there 
were over four times as many fadvise DONTNEED calls as map-reduce pairs.  
Further logging showed the same map outputs being fetched about this many times.

This is a regression from MR1, which only did the fadvise DONTNEED after all 
the bytes were transferred.

  was:
When a reducer initiates a fetch request, it does not know whether it will be 
able to fit the fetched data in memory.  The first part of the response tells 
how much data will be coming.  If space is not currently available, the reduce 
will abandon its request and try again later.  Unfortunately, this has some 
consequences on the server side - it forces unnecessary disk and network IO as 
the server begins to read the output data that will go nowhere.  Also, when the 
channel is closed, it triggers an fadvise DONTNEED that causes the data region 
to be evicted from the OS page cache.  Meaning that the next time it's asked 
for, it will definitely be read from disk, even if it happened to be in the 
page cache before the request.

I noticed this when trying to figure out why my job was doing so much more disk 
IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found that disk 
reads went to nearly 0 on machines that had enough memory to fit map outputs 
into the page cache.  I then straced the NodeManager noticed that there were 
over four times as many fadvise DONTNEED calls as map-reduce pairs.  Further 
logging showed the same map outputs being fetched about this many times.

The fix would be to reserve space in the reducer before fetching the data.  
Currently the fetching the size of the data and fetching the actual data happen 
in the same HTTP request.  Fixing it would require doing these in separate HTTP 
requests.  Or transferring the sizes through the AM.



> ShuffleHandler fadvises file regions as DONTNEED even when fetch fails
> ----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5601
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5601
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 2.2.0
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>
> When a reducer initiates a fetch request, it does not know whether it will be 
> able to fit the fetched data in memory.  The first part of the response tells 
> how much data will be coming.  If space is not currently available, the 
> reduce will abandon its request and try again later.  When this occurs, the 
> ShuffleHandler still fadvises the file region as DONTNEED.  Meaning that the 
> next time it's asked for, it will definitely be read from disk, even if it 
> happened to be in the page cache before the request.
> I noticed this when trying to figure out why my job was doing so much more 
> disk IO in MR2 than in MR1.  When I turned the fadvise stuff off, I found 
> that disk reads went to nearly 0 on machines that had enough memory to fit 
> map outputs into the page cache.  I then straced the NodeManager and noticed 
> that there were over four times as many fadvise DONTNEED calls as map-reduce 
> pairs.  Further logging showed the same map outputs being fetched about this 
> many times.
> This is a regression from MR1, which only did the fadvise DONTNEED after all 
> the bytes were transferred.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (MAPREDUCE-5601) ShuffleHandler fadvises file regions as DONTNEED even when fetch fails

Reply via email to