My testing showed that Andrei is correct with the assumption that the kernel can optimize the small reads and writes in a multi-threaded application. I had to use large buffers up to 64 MB with my "single-threaded 100% synchronized writes" version to see the simple multi-threaded version from Johannes Pfau add 4,3% overhead during a 512 MB copy operation.
Some more things I've experimented with: - using only system API calls instead of D wrappers: The difference is close to background noise - direct I/O for writing as used by databases: This worked pretty well, but you may not want to use it for reading as it bypasses the file cache. A file that is already cached would be copied slower as a result. - memory maps: Kernel memory is shared with userspace. This approach does not allocate memory in the application. It just makes pages of files directly accessible in user space. Once mapped, the whole copy operation comes down to a single 'memcpy' call. - splice (zero-copy): This is a Linux command that allows memory operations inside the kernel to be controlled from user space. The benefit is that the CPU never copies this memory from kernel to user space. Unfortunately the copy operation goes like this: "source file -> pipe , pipe -> destination file" A pipe is a hard-coded 64KB buffer. So it is not easy to move large chunks of data in a single call to splice(). 512 MB are still divided into 16.000+ calls.Although splice looks promising it suffers from too many context switches. I had the best results with direct I/O and using synchronized writes for buffer sizes from 8 MB onwards, but I found this to be too complex and probably system dependent. So I settled with the memory mapped version, that I rewrote using Phobos instead of POSIX calls, so it should run equally well on all platforms and is 5 lines of code at it's core:
----------------------------------------------------------------------
import std.datetime, std.exception, std.stdio, std.mmfile;
void main(string[] args)
{
if (!enforce(args.length == 3, {
stderr.writefln("%s SOURCE DEST", args[0]);
})) return;
auto sw = StopWatch();
sw.start();
auto src = new MmFile(args[1], MmFile.Mode.Read, 0, null, 0);
auto dst = new MmFile(args[2], MmFile.Mode.ReadWriteNew, src.length,
null, src.length);
auto data = dst[];
data[] = src[];
dst.flush();
sw.stop();
writefln("Copied %s bytes in %s msec (%s kB/s)", src.length,
sw.peek().msecs,
1_000_000 * src.length / (1024 * sw.peek().usecs));
}
----------------------------------------------------------------------
This leaves it up to the kernel how to interleave disk reads and writes.
- Marco
dcopy.d
Description: Binary data
