Fast file copy (threaded or not?)

Marco Leise Thu, 01 Sep 2011 13:15:26 -0700

I split the discussion with Andrei about the benefit of a multi-threaded file copy routine to its own thread. This is about copying a file from and to the same HDD - a mechanical disk with seek times.

My testing showed that Andrei is correct with the assumption that the kernel can optimize the small reads and writes in a multi-threaded application. I had to use large buffers up to 64 MB with my "single-threaded 100% synchronized writes" version to see the simple multi-threaded version from Johannes Pfau add 4,3% overhead during a 512 MB copy operation.


Some more things I've experimented with:

- using only system API calls instead of D wrappers:
  The difference is close to background noise

- direct I/O for writing as used by databases:
  This worked pretty well, but you may not want to use it for
  reading as it bypasses the file cache. A file that is already
  cached would be copied slower as a result.

- memory maps:
  Kernel memory is shared with userspace. This approach does
  not allocate memory in the application. It just makes pages
  of files directly accessible in user space. Once mapped, the
  whole copy operation comes down to a single 'memcpy' call.

- splice (zero-copy):
  This is a Linux command that allows memory operations inside
  the kernel to be controlled from user space. The benefit is
  that the CPU never copies this memory from kernel to
  user space. Unfortunately the copy operation goes like this:
  "source file -> pipe , pipe -> destination file"
  A pipe is a hard-coded 64KB buffer. So it is not easy to move
  large chunks of data in a single call to splice(). 512 MB are
  still divided into 16.000+ calls.

Although splice looks promising it suffers from too many context switches. I had the best results with direct I/O and using synchronized writes for buffer sizes from 8 MB onwards, but I found this to be too complex and probably system dependent. So I settled with the memory mapped version, that I rewrote using Phobos instead of POSIX calls, so it should run equally well on all platforms and is 5 lines of code at it's core:


----------------------------------------------------------------------

import std.datetime, std.exception, std.stdio, std.mmfile;

void main(string[] args)
{
    if (!enforce(args.length == 3, {
        stderr.writefln("%s SOURCE DEST", args[0]);
    })) return;

    auto sw = StopWatch();
    sw.start();

    auto src = new MmFile(args[1], MmFile.Mode.Read, 0, null, 0);

auto dst = new MmFile(args[2], MmFile.Mode.ReadWriteNew, src.length, null, src.length);

    auto data = dst[];
    data[] = src[];
    dst.flush();

    sw.stop();

writefln("Copied %s bytes in %s msec (%s kB/s)", src.length, sw.peek().msecs,

            1_000_000 * src.length / (1024 * sw.peek().usecs));
}

----------------------------------------------------------------------

This leaves it up to the kernel how to interleave disk reads and writes.

- Marco

dcopy.d
Description: Binary data

Fast file copy (threaded or not?)

Reply via email to