Many readers of the list will know who I am, but for those who don't let me start by just mentioning that I'm the lead developer and designer of Ardour, an open source GPL'ed DAW for Linux, OS X and Windows, and also the original author of JACK, a cross-platform API for low latency, realtime audio and MIDI, including inter-application and network routing.
Over in Ardour-land, we've wrestled for a couple of years now with less
than ideal support for OS X and are just finally beginning to resolve
many/most of the areas where we were falling down.
But there's an area where we've really run into a brick wall, despite the
extremely deep and amazingly talented pool of people we have as developers,
people who know *nix operating systems inside out and upside down.
That area is: getting acceptable disk I/O bandwidth on OS X when reading
many files. For comparison: we can easily handle 1024 tracks on a single
spinning disk on Linux. We can get close to similar performance on Windows.
We can even (gasp!) get in the same ballpark on Windows running as a guest
OS on OS X (i.e. the entire Windows filesystem is just one file from OS X's
perspective).
But on OS X itself, we are unable to get consistent, reliable performance
when reading many files at once. Our code is, of course, entirely
cross-platform, and so we are not using any Apple specific APIs (or
Linux-specific or Windows-specific). The same code works well for recording
(writing) files (though OS X does still perform worse than other platforms).
We've used various dtrace-based tools to try to analyse what is going on:
these only confirmed for us that we have a problem, but provided no
insights into what.
Things became so inexplicable that we decided to break out the problem from
Ardour itself and write a small test application that would let us easily
collect data from many different systems. We've now tested on the order of
a half-dozen different OS X systems, and they all show the same basic bad
behaviour:
* heavy dependence of sustained streaming bandwidth on the number of
files being read
(i.e. sustained streaming bandwidth is high when reading 10
files, but can be very low
when reading 128 files; This dependence is low on Windows
and non-existent
on Linux)
* periodic drops of sustained streaming bandwidth of as much a factor
of 50, which can
last for several seconds (e.g a disk that can peak at 100MB/sec
fails to deliver
better than 5MB/sec for a noticeable period).
* a requirement to read larger blocksizes to get the same bandwidth
than on other platforms
Our test application is small, less than 130 lines of code. It uses the
POSIX API to read a specified blocksize from each of N files, and reports
on the observed I/O bandwidth. It comes with a companion shell script
(about 60 lines) which sets up the files to be read and then runs the
executable with each of series of (specified) blocksizes.
I've attached both files (source code for the executable, plus the shell
script). The executable has an unfortunate reliance right now on glib in
order to get the cross-platform g_get_monotonic_time(). If you want to run
them, the build command is at the top of the executable, and then you would
just do:
./run-readtest.sh -d FOLDER-FOR-TEST-FILES -n NUMBER-OF-FILES -f
FILESIZE-IN-BYTES blocksize1 blocksize2 ....
(You'll also need some pretty large-ish chunk of diskspace available for
the test files). The blocksize list is typically powers of 2 starting at
16348 and going up to 4MB. We only see problems with NUMBER-OF-FILES is on
the order of 128 or above. To be a useful test, the filesizes need to be at
least 10MB or so. The full test takes several (2-20) minutes depending on
the overall speed of your disk.
But you don't have to run them: you could just read the source code
executable to see if any lightbulbs go off. Why would this code fail so
badly once the number of files gets up to 100+? Why can we predictably
make any version of OS X able to get barely 5MB/sec from a disk that can
sustain 100MB/sec? Are there some tricks to making this sort of thing work
on OS X that anyone is aware of? Keep in mind that this same code works
reliably, solidly and predictably on Windows and Linux (and even works
reliably on Windows as a guest OS on OS X).
We do know, by simple inspection, that other DAWs, from Reaper to Logic,
appear to have solved this problem. Alas, unlike Ardour, they don't
generally share their code, so we have no idea whether they did something
clever, or whether we are doing something stupid.
I, and rest of the Ardour community, would be very grateful for any
insights anyone can offer.
thanks,
--p
/* gcc -o readtest readtest.c `pkg-config --cflags --libs glib-2.0` -lm */
#include <stdlib.h>
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <stdint.h>
#include <string.h>
#include <getopt.h>
#include <fcntl.h>
#include <math.h>
#include <glib.h>
char* data = 0;
void
usage ()
{
fprintf (stderr, "readtest [ -b BLOCKSIZE ] [-l FILELIMIT] [ -D ] filename-template\n");
}
int
main (int argc, char* argv[])
{
int* files;
char optstring[] = "b:Dl:q";
uint32_t block_size = 64 * 1024 * 4;
int max_files = -1;
#ifdef __APPLE__
int direct = 0;
#endif
const struct option longopts[] = {
{ "blocksize", 1, 0, 'b' },
{ "direct", 0, 0, 'D' },
{ "limit", 1, 0, 'l' },
{ 0, 0, 0, 0 }
};
int option_index = 0;
int c = 0;
char const * name_template = 0;
int flags = O_RDONLY;
int n = 0;
int nfiles = 0;
int quiet = 0;
while (1) {
if ((c = getopt_long (argc, argv, optstring, longopts, &option_index)) == -1) {
break;
}
switch (c) {
case 'b':
block_size = atoi (optarg);
break;
case 'l':
max_files = atoi (optarg);
break;
case 'D':
#ifdef __APPLE__
direct = 1;
#endif
break;
case 'q':
quiet = 1;
break;
default:
usage ();
return 0;
}
}
if (optind < argc) {
name_template = argv[optind];
} else {
usage ();
return 1;
}
while (1) {
char path[PATH_MAX+1];
snprintf (path, sizeof (path), name_template, n+1);
if (access (path, R_OK) != 0) {
break;
}
++n;
if (max_files > 0 && n >= max_files) {
break;
}
}
if (n == 0) {
fprintf (stderr, "No matching files found for %s\n", name_template);
return 1;
}
if (!quiet) {
printf ("# Discovered %d files using %s\n", n, name_template);
}
nfiles = n;
files = (int *) malloc (sizeof (int) * nfiles);
for (n = 0; n < nfiles; ++n) {
char path[PATH_MAX+1];
int fd;
snprintf (path, sizeof (path), name_template, n+1);
if ((fd = open (path, flags, 0644)) < 0) {
fprintf (stderr, "Could not open file #%d @ %s (%s)\n", n, path, strerror (errno));
return 1;
}
#ifdef __APPLE__
if (direct) {
/* Apple man pages say only that it returns "a value other than -1 on success",
which probably means zero, but you just can't be too careful with
those guys.
*/
if (fcntl (fd, F_NOCACHE, 1) == -1) {
fprintf (stderr, "Cannot set F_NOCACHE on file #%d\n", n);
}
}
#endif
files[n] = fd;
}
data = (char*) malloc (sizeof (char) * block_size);
uint64_t _read = 0;
double max_elapsed = 0;
double total_time = 0;
double var_m = 0;
double var_s = 0;
uint64_t cnt = 0;
while (1) {
gint64 before;
before = g_get_monotonic_time();
for (n = 0; n < nfiles; ++n) {
if (read (files[n], (char*) data, block_size) != block_size) {
goto out;
}
}
_read += block_size;
gint64 elapsed = g_get_monotonic_time() - before;
double bandwidth = ((nfiles * block_size)/1048576.0) / (elapsed/1000000.0);
if (!quiet) {
printf ("# BW @ %lu %.3f seconds bandwidth %.4f MB/sec\n", (long unsigned int)_read, elapsed/1000000.0, bandwidth);
}
total_time += elapsed;
++cnt;
if (max_elapsed == 0) {
var_m = elapsed;
} else {
const double var_m1 = var_m;
var_m = var_m + (elapsed - var_m) / (double)(cnt);
var_s = var_s + (elapsed - var_m) * (elapsed - var_m1);
}
if (elapsed > max_elapsed) {
max_elapsed = elapsed;
}
}
out:
if (max_elapsed > 0 && total_time > 0) {
double stddev = cnt > 1 ? sqrt(var_s / ((double)(cnt-1))) : 0;
double bandwidth = ((nfiles * _read)/1048576.0) / (total_time/1000000.0);
double min_throughput = ((nfiles * block_size)/1048576.0) / (max_elapsed/1000000.0);
printf ("# Min: %.4f MB/sec Avg: %.4f MB/sec || Max: %.3f sec \n", min_throughput, bandwidth, max_elapsed/1000000.0);
printf ("# Max Track count: %d @ 48000SPS\n", (int) floor(1048576.0 * bandwidth / (4 * 48000.)));
printf ("# Sus Track count: %d @ 48000SPS\n", (int) floor(1048576.0 * min_throughput / (4 * 48000.)));
printf ("%d %.4f %.4f %.4f %.5f\n", block_size, min_throughput, bandwidth, max_elapsed/1000000.0, stddev/1000000.0);
}
return 0;
}
run-readtest.sh
Description: Bourne shell script
_______________________________________________ Do not post admin requests to the list. They will be ignored. Coreaudio-api mailing list ([email protected]) Help/Unsubscribe/Update your Subscription: https://lists.apple.com/mailman/options/coreaudio-api/archive%40mail-archive.com This email sent to [email protected]
