Marco Leise wrote:
>I split the discussion with Andrei about the benefit of a
>multi-threaded file copy routine to its own thread.
>This is about copying a file from and to the same HDD - a mechanical
>disk with seek times.
>
>My testing showed that Andrei is correct with the assumption that the
>kernel can optimize the small reads and writes in a multi-threaded
>application. I had to use large buffers up to 64 MB with my
>"single-threaded 100% synchronized writes" version to see the simple
>multi-threaded version from Johannes Pfau add 4,3% overhead during a
>512 MB copy operation.
>
>Some more things I've experimented with:
>
>- using only system API calls instead of D wrappers:
> The difference is close to background noise
>
>- direct I/O for writing as used by databases:
> This worked pretty well, but you may not want to use it for
> reading as it bypasses the file cache. A file that is already
> cached would be copied slower as a result.
>
>- memory maps:
> Kernel memory is shared with userspace. This approach does
> not allocate memory in the application. It just makes pages
> of files directly accessible in user space. Once mapped, the
> whole copy operation comes down to a single 'memcpy' call.
>
>- splice (zero-copy):
> This is a Linux command that allows memory operations inside
> the kernel to be controlled from user space. The benefit is
> that the CPU never copies this memory from kernel to
> user space. Unfortunately the copy operation goes like this:
> "source file -> pipe , pipe -> destination file"
> A pipe is a hard-coded 64KB buffer. So it is not easy to move
> large chunks of data in a single call to splice(). 512 MB are
> still divided into 16.000+ calls.
>
>Although splice looks promising it suffers from too many context
>switches. I had the best results with direct I/O and using
>synchronized writes for buffer sizes from 8 MB onwards, but I found
>this to be too complex and probably system dependent. So I settled
>with the memory mapped version, that I rewrote using Phobos instead of
>POSIX calls, so it should run equally well on all platforms and is 5
>lines of code at it's core:
>
>----------------------------------------------------------------------
>
>import std.datetime, std.exception, std.stdio, std.mmfile;
>
>void main(string[] args)
>{
> if (!enforce(args.length == 3, {
> stderr.writefln("%s SOURCE DEST", args[0]);
> })) return;
>
> auto sw = StopWatch();
> sw.start();
>
> auto src = new MmFile(args[1], MmFile.Mode.Read, 0, null, 0);
> auto dst = new MmFile(args[2], MmFile.Mode.ReadWriteNew,
> src.length,
>null, src.length);
> auto data = dst[];
> data[] = src[];
> dst.flush();
>
> sw.stop();
> writefln("Copied %s bytes in %s msec (%s kB/s)", src.length,
>sw.peek().msecs,
> 1_000_000 * src.length / (1024 * sw.peek().usecs));
>}
>
>----------------------------------------------------------------------
>
>This leaves it up to the kernel how to interleave disk reads and
>writes.
>
>- Marco
I changed the threaded implementation a little so that it doesn't
allocate buffers dynamically:
https://gist.github.com/1188128
I hope I didn't screw up there. The idea is to have 2 buffers. Then, at
the same time, one buffer is read from and one buffer is written to.
When both read & write are finished, the buffers are swapped.
--
Johannes Pfau