[jira] [Commented] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

Colin Patrick McCabe (JIRA) Mon, 17 Dec 2012 12:58:25 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534295#comment-13534295
 ]


Colin Patrick McCabe commented on HDFS-347:
-------------------------------------------

I did some manual testing and found that it worked with Kerberos enabled.

I also found that it is competitive with the old local block reader 
implementation on my test.
My test was catting a 1G file 7 times from FsShell.

Numbers for the old local reads implementation:
{code}
cmccabe@keter:/h> /usr/bin/time bash -c './bin/hadoop fs -cat /zero /zero /zero 
/zero /zero /zero /zero &> /dev/null'
4.34user 1.30system 0:04.27elapsed 132%CPU (0avgtext+0avgdata 
418592maxresident)k
0inputs+88outputs (0major+25448minor)pagefaults 0swaps
cmccabe@keter:/h> /usr/bin/time bash -c './bin/hadoop fs -cat /zero /zero /zero 
/zero /zero /zero /zero &> /dev/null'
4.34user 1.27system 0:04.28elapsed 131%CPU (0avgtext+0avgdata 
419456maxresident)k
0inputs+72outputs (0major+24315minor)pagefaults 0swaps
cmccabe@keter:/h> /usr/bin/time bash -c './bin/hadoop fs -cat /zero /zero /zero 
/zero /zero /zero /zero &> /dev/null'
4.51user 1.29system 0:04.31elapsed 134%CPU (0avgtext+0avgdata 
450320maxresident)k
0inputs+72outputs (0major+25563minor)pagefaults 0swaps
{code}

New implementation:
{code}
cmccabe@keter:/h> /usr/bin/time ./bin/hadoop fs -cat /zero /zero /zero /zero 
/zero /zero /zero &> /dev/null
cmccabe@keter:/h> /usr/bin/time bash -c './bin/hadoop fs -cat /zero /zero /zero 
/zero /zero /zero /zero &> /dev/null'                     
4.35user 1.28system 0:04.43elapsed 127%CPU (0avgtext+0avgdata 
421520maxresident)k
0inputs+72outputs (0major+25717minor)pagefaults 0swaps
cmccabe@keter:/h> /usr/bin/time bash -c './bin/hadoop fs -cat /zero /zero /zero 
/zero /zero /zero /zero &> /dev/null'
4.28user 1.24system 0:04.41elapsed 125%CPU (0avgtext+0avgdata 
424480maxresident)k
0inputs+72outputs (0major+24634minor)pagefaults 0swaps
cmccabe@keter:/h> /usr/bin/time bash -c './bin/hadoop fs -cat /zero /zero /zero 
/zero /zero /zero /zero &> /dev/null'
4.36user 1.30system 0:04.51elapsed 125%CPU (0avgtext+0avgdata 
453280maxresident)k
0inputs+80outputs (0major+25360minor)pagefaults 0swaps
{code}

No local reads:
{code}
cmccabe@keter:/h> /usr/bin/time bash -c './bin/hadoop fs -cat /zero /zero /zero 
/zero /zero /zero /zero &> /dev/null'
6.01user 3.15system 0:08.06elapsed 113%CPU (0avgtext+0avgdata 
434000maxresident)k
0inputs+64outputs (0major+25949minor)pagefaults 0swaps
cmccabe@keter:/h> /usr/bin/time bash -c './bin/hadoop fs -cat /zero /zero /zero 
/zero /zero /zero /zero &> /dev/null'
5.24user 3.15system 0:07.09elapsed 118%CPU (0avgtext+0avgdata 
443088maxresident)k
0inputs+64outputs (0major+24773minor)pagefaults 0swaps
cmccabe@keter:/h> /usr/bin/time bash -c './bin/hadoop fs -cat /zero /zero /zero 
/zero /zero /zero /zero &> /dev/null'
5.32user 3.13system 0:07.16elapsed 118%CPU (0avgtext+0avgdata 
445472maxresident)k
0inputs+64outputs (0major+24819minor)pagefaults 0swaps
{code}
                
> DFS read performance suboptimal when client co-located on nodes with data
> -------------------------------------------------------------------------
>
>                 Key: HDFS-347
>                 URL: https://issues.apache.org/jira/browse/HDFS-347
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, hdfs-client, performance
>            Reporter: George Porter
>            Assignee: Colin Patrick McCabe
>         Attachments: all.tsv, BlockReaderLocal1.txt, HADOOP-4801.1.patch, 
> HADOOP-4801.2.patch, HADOOP-4801.3.patch, HDFS-347-016_cleaned.patch, 
> HDFS-347.016.patch, HDFS-347.017.clean.patch, HDFS-347.017.patch, 
> HDFS-347.018.clean.patch, HDFS-347.018.patch2, HDFS-347.019.patch, 
> HDFS-347.020.patch, HDFS-347.021.patch, HDFS-347.022.patch, 
> HDFS-347.024.patch, HDFS-347.025.patch, HDFS-347.026.patch, 
> HDFS-347.027.patch, HDFS-347.029.patch, HDFS-347-branch-20-append.txt, 
> hdfs-347.png, hdfs-347.txt, local-reads-doc
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-347) DFS read performance suboptimal when client co-located on nodes with data

Reply via email to