[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

Todd Lipcon (JIRA) Mon, 12 Oct 2009 01:14:59 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764594#action_12764594
 ]


Todd Lipcon commented on HDFS-347:
----------------------------------

Spent some more time on the implementation this weekend. Here's a benchmark 
including checksums:

t...@todd-laptop:~/git/hadoop-common/build/hadoop-core-0.21.0-dev$ time 
./bin/hadoop fs -Ddfs.client.use.unix.sockets=false -cat bigfile bigfile 
bigfile > /dev/null

real    0m13.502s
user    0m9.561s
sys     0m2.904s

t...@todd-laptop:~/git/hadoop-common/build/hadoop-core-0.21.0-dev$ time 
./bin/hadoop fs -Ddfs.client.use.unix.sockets=true -cat bigfile bigfile bigfile 
> /dev/null
real    0m9.644s
user    0m8.321s
sys     0m1.012s

bigfile is a 700MB file that I put on my local pseudo-distributed cluster.

For comparison, here's just catting the same file:
t...@todd-laptop:~/git/hadoop-common/build/hadoop-core-0.21.0-dev$ time 
./bin/hadoop fs -cat file:///var/www/ubuntu-8.10-desktop-amd64.iso 
file:///var/www/ubuntu-8.10-desktop-amd64.iso 
file:///var/www/ubuntu-8.10-desktop-amd64.iso  > /dev/null
real    0m2.914s
user    0m1.760s
sys     0m1.068s

So, the result is about a 30% speedup over the current implementation, but 
still 3x overhead compared to local filesystem. Profiling shows that most of 
this is in the copying out of the direct FileChannels - I think a little bit of 
smart buffering there to create larger reads over the native-code boundary will 
get us closer to 2x overhead.

Will clean up the patch tomorrow and upload (though it still needs plenty of 
work to be commitable)

> DFS read performance suboptimal when client co-located on nodes with data
> -------------------------------------------------------------------------
>
>                 Key: HDFS-347
>                 URL: https://issues.apache.org/jira/browse/HDFS-347
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: George Porter
>         Attachments: HADOOP-4801.1.patch, HADOOP-4801.2.patch, 
> HADOOP-4801.3.patch, local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

Reply via email to