[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223093#comment-13223093 ]
dhruba borthakur commented on HDFS-2834: ---------------------------------------- Agreed Todd. Instead of subverting this jira, maybe we can open a new jira and discus how we can get scattergather into the FSDataInoutStream API. Also, it is better to do the mmap only once for the entire file instead of mmaping only the relevant offsets on demand. I will open a JIRA > ByteBuffer-based read API for DFSInputStream > -------------------------------------------- > > Key: HDFS-2834 > URL: https://issues.apache.org/jira/browse/HDFS-2834 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Henry Robinson > Assignee: Henry Robinson > Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, > HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.patch, > HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png > > > The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated > {{byte[]}}. Although for many clients this is desired behaviour, in certain > situations, such as native-reads through libhdfs, this imposes an extra copy > penalty since the {{byte[]}} needs to be copied out again into a natively > readable memory area. > For these cases, it would be preferable to allow the client to supply its own > buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira