[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223052#comment-13223052 ]
Todd Lipcon commented on HDFS-2834: ----------------------------------- Hi Dhruba. Looks like your implementation provides for "zero copy" reads, since you can return mmapped memory directly. This is a neat trick, but a little different from this JIRA, where it still copies (just once) into a user-provided buffer. Also, looks like your implementation doesn't do any checksumming for this API, right? Other concerns with the mmap approach are: - it doesn't actually unmap when it goes "out of scope" as your comment indicates -- you need to wait on an actual GC to call the finalizers, which can cause the process to run out of address space on a 32-bit JVM if there isn't actual pressure on the Java heap - the mmap() call will cause a TLB shootdown across all the threads - I'd be surprised if the API is actually faster for the case of smaller multi-threaded reads. See section 5.1 of this paper for more info: http://www.scribd.com/doc/59150636/C4-Continuously-Concurrent-Compacting-Collector > ByteBuffer-based read API for DFSInputStream > -------------------------------------------- > > Key: HDFS-2834 > URL: https://issues.apache.org/jira/browse/HDFS-2834 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Henry Robinson > Assignee: Henry Robinson > Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, > HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.patch, > HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png > > > The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated > {{byte[]}}. Although for many clients this is desired behaviour, in certain > situations, such as native-reads through libhdfs, this imposes an extra copy > penalty since the {{byte[]}} needs to be copied out again into a natively > readable memory area. > For these cases, it would be preferable to allow the client to supply its own > buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira