[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222969#comment-13222969 ]
Henry Robinson commented on HDFS-2834: -------------------------------------- Thanks for the review! Per your first two questions: * There's no significant difference in my benchmarks with the old copying path doing the same experiment: || ||Native Checksums|| No Checksums|| Non-native Checksums|| Remote, Native Checksums|| |Copying (MB/s) - 32k buffer and request size| 2010.21 |2290.50| 721.52| 1412.20| |Old copying path - 32k buffer and request size |2087.43 |2232.67| 708.67 |1365.60| * I've run the modified TestParallelRead tests for a couple of hours, but I plan to do a soak test overnight with the full suite before this gets committed. > ByteBuffer-based read API for DFSInputStream > -------------------------------------------- > > Key: HDFS-2834 > URL: https://issues.apache.org/jira/browse/HDFS-2834 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Henry Robinson > Assignee: Henry Robinson > Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, > HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.patch, > HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png > > > The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated > {{byte[]}}. Although for many clients this is desired behaviour, in certain > situations, such as native-reads through libhdfs, this imposes an extra copy > penalty since the {{byte[]}} needs to be copied out again into a natively > readable memory area. > For these cases, it would be preferable to allow the client to supply its own > buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira