Marco Leise wrote:
>I split the discussion with Andrei about the benefit of a
>multi-threaded file copy routine to its own thread.
>This is about copying a file from and to the same HDD - a mechanical
>disk with seek times.
>
>My testing showed that Andrei is correct with the assumption that the  
>kernel can optimize the small reads and writes in a multi-threaded  
>application. I had to use large buffers up to 64 MB with my  
>"single-threaded 100% synchronized writes" version to see the simple  
>multi-threaded version from Johannes Pfau add 4,3% overhead during a
>512 MB copy operation.
>
>Some more things I've experimented with:
>
>- using only system API calls instead of D wrappers:
>   The difference is close to background noise
>
>- direct I/O for writing as used by databases:
>   This worked pretty well, but you may not want to use it for
>   reading as it bypasses the file cache. A file that is already
>   cached would be copied slower as a result.
>
>- memory maps:
>   Kernel memory is shared with userspace. This approach does
>   not allocate memory in the application. It just makes pages
>   of files directly accessible in user space. Once mapped, the
>   whole copy operation comes down to a single 'memcpy' call.
>
>- splice (zero-copy):
>   This is a Linux command that allows memory operations inside
>   the kernel to be controlled from user space. The benefit is
>   that the CPU never copies this memory from kernel to
>   user space. Unfortunately the copy operation goes like this:
>   "source file -> pipe , pipe -> destination file"
>   A pipe is a hard-coded 64KB buffer. So it is not easy to move
>   large chunks of data in a single call to splice(). 512 MB are
>   still divided into 16.000+ calls.
>
>Although splice looks promising it suffers from too many context
>switches. I had the best results with direct I/O and using
>synchronized writes for buffer sizes from 8 MB onwards, but I found
>this to be too complex and probably system dependent. So I settled
>with the memory mapped version, that I rewrote using Phobos instead of
>POSIX calls, so it should run equally well on all platforms and is 5
>lines of code at it's core:
>
>----------------------------------------------------------------------
>
>import std.datetime, std.exception, std.stdio, std.mmfile;
>
>void main(string[] args)
>{
>     if (!enforce(args.length == 3, {
>         stderr.writefln("%s SOURCE DEST", args[0]);
>     })) return;
>
>     auto sw = StopWatch();
>     sw.start();
>
>     auto src = new MmFile(args[1], MmFile.Mode.Read, 0, null, 0);
>     auto dst = new MmFile(args[2], MmFile.Mode.ReadWriteNew,
> src.length,  
>null, src.length);
>     auto data = dst[];
>     data[] = src[];
>     dst.flush();
>
>     sw.stop();
>     writefln("Copied %s bytes in %s msec (%s kB/s)", src.length,  
>sw.peek().msecs,
>             1_000_000 * src.length / (1024 * sw.peek().usecs));
>}
>
>----------------------------------------------------------------------
>
>This leaves it up to the kernel how to interleave disk reads and
>writes.
>
>- Marco
I changed the threaded implementation a little so that it doesn't
allocate buffers dynamically:
https://gist.github.com/1188128

I hope I didn't screw up there. The idea is to have 2 buffers. Then, at
the same time, one buffer is read from and one buffer is written to.
When both read & write are finished, the buffers are swapped.
-- 
Johannes Pfau

Reply via email to