[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13938139#comment-13938139
 ] 

Chris Nauroth commented on MAPREDUCE-5791:
------------------------------------------

Thanks for the update, [~nikola.vujic].  A few more thoughts on this patch:

# {{mapred-default.xml}}
## I think by setting {{mapreduce.shuffle.transferTo.allowed}} to {{true}} in 
mapred-default.xml, you'll end up getting transferTo turned on in Windows by 
default.  The value defined in the XML takes precedence over the default in the 
code.  Instead, if you set the value to nothing/empty string in the XML, then 
your logic in {{ShuffleHandler}} will select the right value at runtime based 
on OS.  For an existing example of this, see 
{{mapreduce.application.classpath}}.
# {{FadvisedFileRegion}}
## It appears that {{customShuffleTransfer}} is public only to make it 
accessible for tests.  Is that correct?  If so, would you please change this to 
package-private (drop public) and add the 
{{com.google.common.annotations.VisibleForTesting}} annotation on the method?
## At the start of {{customShuffleTransfer}}, there is some indentation by 4 
spaces in front of the {{throw}} and the {{return 0L;}}.  Would you please 
change those to 2 spaces?
## I was a bit confused by this logic:
{code}
      //adjust counters and buffer limit
      if(readSize < trans) {
        trans -= readSize;
        position += readSize;
        byteBuffer.flip();
      } else {
        byteBuffer.limit((int)trans);
        byteBuffer.position(0);
        position += trans; 
        trans = 0;
      }
{code}
I don't understand why we need the conditional.  Looking specifically at the 
else block, setting limit to current position and setting current position to 0 
is the same effect as calling {{Buffer#flip}}.  Updating {{position}} looks 
unnecessary, because setting {{trans}} to 0 forces this to be the final 
iteration of the loop.  Overall, it appears that this code could be simplified 
by dropping the if/else and always running the logic currently in the if block. 
 What do you think?  Is there some edge case that I'm missing?
# {{TestFadvisedFileRegion}}
## There is still a risk of a resource leak in the test, because the 
{{RandomAccessFile}} constructor can throw an exception.  You could 
successfully open {{inputFile}}, and then {{targetFile}} fails with an 
exception, but then {{inputFile}} would never be closed.  I recommend declaring 
{{inputFile}}, {{targetFile}} and {{target}} outside the try block, initialized 
to null, and then initialize them first thing inside the try block.

> Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not 
> read disks efficiently
> ------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5791
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5791
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Nikola Vujic
>            Assignee: Nikola Vujic
>         Attachments: MAPREDUCE-5791.patch, MAPREDUCE-5791.patch
>
>
> transferTo method in org.apache.hadoop.mapred.FadvisedFileRegion is using 
> transferTo method from a FileChannel to transfer data from a disk to socket. 
> This is performing slow in Windows, slower than in Linux. The reason is that 
> transferTo method for the java.nio is issuing 32K IO requests all the time. 
> In Windows, these 32K transfers are not optimal and we don't get the best 
> performance form the underlying IO subsystem. In order to achieve better 
> performance when reading from the drives, we need to read data in bigger 
> chunks, 512K for example.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to