[ 
https://issues.apache.org/jira/browse/HADOOP-3164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12591471#action_12591471
 ] 

Raghu Angadi commented on HADOOP-3164:
--------------------------------------


Buffer size might matter on Linux also to certain extent (may be 15 - 20%) 
between 4k and 128k. I could not reproduce this on my deb box, where the disk 
maxes out at 30-35 MB/s, but DFSIO seems to show.

TestDFSIO essentially tests (bursty) write and read of a 320MB file. Each 
mapper writes or reads such a file reports time taken for this IO. With the 
patch results showed 10-20% dip in DFSIO (smaller the cluster, larger 
difference). To avoid misconfiguration, I tried a path with hard coded 128KB 
buffer size while sending block and results came back to normal.

Once we confirm misconfiguration, we can consider a buffer size "cut off" for 
transferTo(). With transferTo(), DataNode does not actually allocate the 
buffer. In that sense, we could increase the size in DataNode without affecting 
client buffering (apart from slight increase in buffer for checksum). 

> Use FileChannel.transferTo() when data is read from DataNode.
> -------------------------------------------------------------
>
>                 Key: HADOOP-3164
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3164
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.18.0
>
>         Attachments: HADOOP-3164.patch, HADOOP-3164.patch, HADOOP-3164.patch, 
> HADOOP-3164.patch, HADOOP-3164.patch
>
>
> HADOOP-2312 talks about using FileChannel's 
> [{{transferTo()}}|http://java.sun.com/javase/6/docs/api/java/nio/channels/FileChannel.html#transferTo(long,%20long,%20java.nio.channels.WritableByteChannel)]
>  and 
> [{{transferFrom()}}|http://java.sun.com/javase/6/docs/api/java/nio/channels/FileChannel.html#transferFrom(java.nio.channels.ReadableByteChannel,%20long,%20long)]
>  in DataNode. 
> At the time DataNode neither used NIO sockets nor wrote large chunks of 
> contiguous block data to socket. Hadoop 0.17 does both when data is seved to 
> clients (and other datanodes). I am planning to try using transferTo() in the 
> trunk. This might reduce DataNode's cpu by another 50% or more.
> Once HADOOP-1702 is committed, we can look into using transferFrom().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to