On Thursday, September 01, 2011 13:13 Marco Leise wrote:
> I split the discussion with Andrei about the benefit of a multi-threaded
> file copy routine to its own thread.
> This is about copying a file from and to the same HDD - a mechanical disk
> with seek times.
> 
> My testing showed that Andrei is correct with the assumption that the
> kernel can optimize the small reads and writes in a multi-threaded
> application. I had to use large buffers up to 64 MB with my
> "single-threaded 100% synchronized writes" version to see the simple
> multi-threaded version from Johannes Pfau add 4,3% overhead during a 512
> MB copy operation.
> 
> Some more things I've experimented with:
> 
> - using only system API calls instead of D wrappers:
> The difference is close to background noise
> 
> - direct I/O for writing as used by databases:
> This worked pretty well, but you may not want to use it for
> reading as it bypasses the file cache. A file that is already
> cached would be copied slower as a result.
> 
> - memory maps:
> Kernel memory is shared with userspace. This approach does
> not allocate memory in the application. It just makes pages
> of files directly accessible in user space. Once mapped, the
> whole copy operation comes down to a single 'memcpy' call.
> 
> - splice (zero-copy):
> This is a Linux command that allows memory operations inside
> the kernel to be controlled from user space. The benefit is
> that the CPU never copies this memory from kernel to
> user space. Unfortunately the copy operation goes like this:
> "source file -> pipe , pipe -> destination file"
> A pipe is a hard-coded 64KB buffer. So it is not easy to move
> large chunks of data in a single call to splice(). 512 MB are
> still divided into 16.000+ calls.
> 
> Although splice looks promising it suffers from too many context switches.
> I had the best results with direct I/O and using synchronized writes for
> buffer sizes from 8 MB onwards, but I found this to be too complex and
> probably system dependent. So I settled with the memory mapped version,
> that I rewrote using Phobos instead of POSIX calls, so it should run
> equally well on all platforms and is 5 lines of code at it's core:
> 
> ----------------------------------------------------------------------
> 
> import std.datetime, std.exception, std.stdio, std.mmfile;
> 
> void main(string[] args)
> {
> if (!enforce(args.length == 3, {
> stderr.writefln("%s SOURCE DEST", args[0]);
> })) return;
> 
> auto sw = StopWatch();
> sw.start();
> 
> auto src = new MmFile(args[1], MmFile.Mode.Read, 0, null, 0);
> auto dst = new MmFile(args[2], MmFile.Mode.ReadWriteNew, src.length,
> null, src.length);
> auto data = dst[];
> data[] = src[];
> dst.flush();
> 
> sw.stop();
> writefln("Copied %s bytes in %s msec (%s kB/s)", src.length,
> sw.peek().msecs,
> 1_000_000 * src.length / (1024 * sw.peek().usecs));
> }
> 
> ----------------------------------------------------------------------
> 
> This leaves it up to the kernel how to interleave disk reads and writes.

I would point out that regardless of what happens with performance with 
synchronous vs asynchronous I/O on a single HDD, it's pretty much a guarantee 
that in the general case asynchronous I/O is going to be faster when dealing 
with different HDDs. So, while we should definitely get hard data, unless 
copying asynchronously on a single hard drive is significantly worse than 
copying synchronously, then it's pretty much a given that we'd want to go with 
asynchronous I/O by default. If it were found that asynchronous I/O was 
significantly better on a single HDD, then that makes the question much more 
interesting, but as long as it's at least close - if not better - than 
synchronous I/O on the same HDD, then asynchronous I/O would be the way to go.

- Jonathan M Davis

Reply via email to