mmap() should be avoided unless the workload is fairly static:
1. There's a bunch of setup at the start of the program, not constant setting up and tearing down of mmaps
2. The mmaped data is accessed many times and is likely not to be paged out; the total number of mapped pages is significantly smaller than memory size
The set up / tear down costs of mmap, as well as mapping and unmapping pages, are very high and are likely to result in a net loss unless they are amortized over a large number of reads to the same pages.
So, for your use case I recommend traditional synchronous reads. I assume this is 2 in your terminology.
On 09/10/2017 04:08 PM, 'Martin Grotzke' via mechanical-sympathy wrote:
Hi, TL;DR: my question covers the difference between MappedByteBuffer vs. direct ByteBuffers when uploading chunks of a file from NFS. Details: I want to upload file chunks to some cloud storage. Input files are several GB large (say s.th. between 1 and 100), accessed via NFS on 64bit Linux/CentOS. An input file has to be split into chunks of ~ 1 to 10 MB (a file has to be split by some index, i.e. I have a list of byte-ranges for a file). I'm planning to use async-http-client (AHC) to upload file chunks via `setBody(ByteBuffer)` [1]. My two favourites for splitting the file into chunks (ByteBuffers) are 1) FileChannel.map -> MappedByteBuffer 2) FileChannel.read(ByteBuffer) -> a (pooled) direct ByteBuffer My understanding of 1) is, that the MappedByteBuffer would represent a segment of virtual memory, so that the OS would even not have to load the data from NFS (during mmap'ing, as long as the MappedByteBuffer is not read). When AHC/netty writes the buffer to the output (socket) channel, the OS/kernel loads data from NFS into the page cache and then writes these pages to the network socket (and to be honest, I have no clue how the NFS API works and how the kernel loads the file chunks). Is this understanding correct? My understanding of 2) is, that on FileChannel.read(ByteBuffer) the OS would read data from NFS and copy it into the memory region backing the direct ByteBuffer. When AHC/netty writes the ByteBuffer to the output channel, the OS would copy data from the memory region to the network socket. Is this understanding correct? Based on these assumptions, 1) should be _a bit_ more efficient than 2), but not significantly. With 1) my concern is that it's not possible to unmap the memory mapped file [2] and I have less control over native memory usage. Therefore my preference currently is 1), using pooled direct ByteBuffers. What do you think about this concern? Is there an even better way than 1) and 2) to achieve what I want? Thanks && cheers, Martin [1] https://github.com/AsyncHttpClient/async-http-client/blob/master/client/src/main/java/org/asynchttpclient/RequestBuilderBase.java#L390 [2] http://bugs.java.com/view_bug.do?bug_id=4724038
-- You received this message because you are subscribed to the Google Groups "mechanical-sympathy" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
