[ https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855205#action_12855205 ]
bc Wong commented on HDFS-941: ------------------------------ I replaced the size-of-one cache with a more generic cache, which is also a global shared cache. There is a new TestParallelRead, which test the concurrent use of a DFSInputStream with concurrent readers. There's a clear speed difference with vs without the patch. Each thread does 1024 # of reads. Trunk: {noformat} Report: 4 threads read 236953 KB (across 1 file(s)) in 5.879s; average 40304.98384078925 KB/s Report: 4 threads read 238873 KB (across 1 file(s)) in 5.063s; average 47180.13035749556 KB/s Report: 4 threads read 236068 KB (across 1 file(s)) in 5.93s; average 39809.10623946037 KB/s Report: 16 threads read 942666 KB (across 1 file(s)) in 13.524s; average 69703.19432120674 KB/s Report: 16 threads read 947015 KB (across 1 file(s)) in 13.401s; average 70667.48750093277 KB/s Report: 16 threads read 948768 KB (across 1 file(s)) in 12.932s; average 73365.91401175379 KB/s Report: 8 threads read 469529 KB (across 2 file(s)) in 5.436s; average 86373.98822663723 KB/s Report: 8 threads read 455428 KB (across 2 file(s)) in 5.363s; average 84920.38038411336 KB/s Report: 8 threads read 469005 KB (across 2 file(s)) in 5.713s; average 82094.34622790127 KB/s {noformat} Patched: {noformat} Report: 4 threads read 236845 KB (across 1 file(s)) in 3.612s; average 65571.70542635658 KB/s Report: 4 threads read 238803 KB (across 1 file(s)) in 4.371s; average 54633.49347975291 KB/s Report: 4 threads read 240241 KB (across 1 file(s)) in 4.395s; average 54662.34357224119 KB/s Report: 16 threads read 938652 KB (across 1 file(s)) in 9.044s; average 103787.26227333037 KB/s Report: 16 threads read 943999 KB (across 1 file(s)) in 8.59s; average 109895.11059371362 KB/s Report: 16 threads read 938546 KB (across 1 file(s)) in 9.081s; average 103352.71445876005 KB/s Report: 8 threads read 478534 KB (across 2 file(s)) in 3.376s; average 141745.85308056872 KB/s Report: 8 threads read 467412 KB (across 2 file(s)) in 3.623s; average 129012.42064587357 KB/s Report: 8 threads read 475349 KB (across 2 file(s)) in 3.49s; average 136203.15186246418 KB/s {noformat} bq. The edits to the docs in DataNode.java are good - if possible they should probably move into HDFS-1001 though, no? The addition to the docs doesn't apply to HDFS-1001, in which the DataXceiver still actively closes all sockets after each use. Todd, the new patch addresses the rest of your comments. > Datanode xceiver protocol should allow reuse of a connection > ------------------------------------------------------------ > > Key: HDFS-941 > URL: https://issues.apache.org/jira/browse/HDFS-941 > Project: Hadoop HDFS > Issue Type: Improvement > Components: data-node, hdfs client > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Assignee: bc Wong > Attachments: HDFS-941-1.patch, HDFS-941-2.patch > > > Right now each connection into the datanode xceiver only processes one > operation. > In the case that an operation leaves the stream in a well-defined state (eg a > client reads to the end of a block successfully) the same connection could be > reused for a second operation. This should improve random read performance > significantly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.