Re: How to efficiently upload file chunks

Avi Kivity Sun, 10 Sep 2017 06:25:30 -0700

mmap() should be avoided unless the workload is fairly static:

1. There's a bunch of setup at the start of the program, not constantsetting up and tearing down of mmaps

2. The mmaped data is accessed many times and is likely not to bepaged out; the total number of mapped pages is significantly smallerthan memory size

The set up / tear down costs of mmap, as well as mapping and unmappingpages, are very high and are likely to result in a net loss unless theyare amortized over a large number of reads to the same pages.

So, for your use case I recommend traditional synchronous reads. Iassume this is 2 in your terminology.




On 09/10/2017 04:08 PM, 'Martin Grotzke' via mechanical-sympathy wrote:

Hi,

TL;DR: my question covers the difference between MappedByteBuffer vs.
direct ByteBuffers when uploading chunks of a file from NFS.

Details: I want to upload file chunks to some cloud storage. Input files
are several GB large (say s.th. between 1 and 100), accessed via NFS on
64bit Linux/CentOS. An input file has to be split into chunks of ~ 1 to
10 MB (a file has to be split by some index, i.e. I have a list of
byte-ranges for a file).

I'm planning to use async-http-client (AHC) to upload file chunks via
`setBody(ByteBuffer)` [1].

My two favourites for splitting the file into chunks (ByteBuffers) are
1) FileChannel.map -> MappedByteBuffer
2) FileChannel.read(ByteBuffer) -> a (pooled) direct ByteBuffer

My understanding of 1) is, that the MappedByteBuffer would represent a
segment of virtual memory, so that the OS would even not have to load
the data from NFS (during mmap'ing, as long as the MappedByteBuffer is
not read). When AHC/netty writes the buffer to the output (socket)
channel, the OS/kernel loads data from NFS into the page cache and then
writes these pages to the network socket (and to be honest, I have no
clue how the NFS API works and how the kernel loads the file chunks).

Is this understanding correct?

My understanding of 2) is, that on FileChannel.read(ByteBuffer) the OS
would read data from NFS and copy it into the memory region backing the
direct ByteBuffer. When AHC/netty writes the ByteBuffer to the output
channel, the OS would copy data from the memory region to the network
socket.

Is this understanding correct?

Based on these assumptions, 1) should be _a bit_ more efficient than 2),
but not significantly. With 1) my concern is that it's not possible to
unmap the memory mapped file [2] and I have less control over native
memory usage. Therefore my preference currently is 1), using pooled
direct ByteBuffers.

What do you think about this concern?

Is there an even better way than 1) and 2) to achieve what I want?

Thanks && cheers,
Martin


[1]
https://github.com/AsyncHttpClient/async-http-client/blob/master/client/src/main/java/org/asynchttpclient/RequestBuilderBase.java#L390
[2] http://bugs.java.com/view_bug.do?bug_id=4724038


--
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: How to efficiently upload file chunks

Reply via email to