[
https://issues.apache.org/jira/browse/MAPREDUCE-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13945407#comment-13945407
]
Nikola Vujic commented on MAPREDUCE-5791:
-----------------------------------------
Hi [~cnauroth],
I have applied all fixes except for the if-else in {{FadvisedFileRegion}}. Edge
case is reading the last chunk of data from a file. {{customShuffleTransfer}}
must read {{actualCount}} bytes from a file, starting from the
{{this.position}}. This is done in the while loop and {{trans}} variable is
used to calculate the number of remaining bytes. {{fileChannel.read}} returns
the number of bytes read. For the last chunk of data this number can be higher
than the remaining number of bytes to read. In that case we cannot use
{{Buffer#flip}}.
For example, let's suppose that we have 128 byte buffer and the we want to read
200 bytes starting at position 1000 in a file (file size bigger than 1256
bytes). At least two iterations of the while loop will be done:
1. Iteration 1: {{fileChannel.read(byteBuffer, 1000+0)}} => 128 bytes are read
=> all 128 bytes are needed => target.write
2. Iteration 2: {{fileChannel.read(byteBuffer, 1000+128)}} => 128 bytes are
read => 128 bytes are read because file is big enough but only first 72 bytes
are needed => {{byteBuffer.limit(72)}} => target.write
In the else block we don't set limit to the current position but to a number
lower than the current position. Updating local {{position}} variable is needed
in order to read data starting from a proper position in the next iterations of
the loop. Does it make sense?
Regarding the resource leak in the test, I applied a change you suggested and I
did the same with the {{fileRegion}} in order to eliminated one try block.
I changed {{customShuffleTransferCornerCases}} to private. It was public.
> Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not
> read disks efficiently
> ------------------------------------------------------------------------------------------------
>
> Key: MAPREDUCE-5791
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5791
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Reporter: Nikola Vujic
> Assignee: Nikola Vujic
> Attachments: MAPREDUCE-5791.patch, MAPREDUCE-5791.patch
>
>
> transferTo method in org.apache.hadoop.mapred.FadvisedFileRegion is using
> transferTo method from a FileChannel to transfer data from a disk to socket.
> This is performing slow in Windows, slower than in Linux. The reason is that
> transferTo method for the java.nio is issuing 32K IO requests all the time.
> In Windows, these 32K transfers are not optimal and we don't get the best
> performance form the underlying IO subsystem. In order to achieve better
> performance when reading from the drives, we need to read data in bigger
> chunks, 512K for example.
--
This message was sent by Atlassian JIRA
(v6.2#6252)