Well, I finally some time to read your thesis. The thing I really didn't
like about all the modular synth systems I have tried (Pd, Csound, Om,
probably more...) is that performance sucked.

I see that you put a good deal of effort in to reusing signal buffers
for best usage of memory cache, which is very cool. When I talk to some
developers of other projects, they just don't get it.

For a while I played around with writing a synth software called Dubnium
<http://bitglue.com/dubnium>. I never developed it enough to actually be
useful, but I did learn some things, anyway.

I think the most original thing about dubnium was that the processing
units were not functions which took pointers to input and output buffers
and did their magic, but instead were chunks of C code which would be
inlined with the code from other modules, then compiled with gcc.

My reasoning for this was simple: many of the processing units in synths
are very simple. Maybe they just multiply two inputs to get the output,
or maybe they are only marginally more complex like a simple 1 pole IIR.

Most implementations might go something like this (in some sort of
twisted pseudocode):

multiply(float *in1, float *in2, float *out) {
  for i in 0..buffer_size {
    out[i] = in1[i] * in2[i]
  }
}

Most people look at this and figure the overhead, or the cpu time spent
doing something other than the multiplication, is just one function call
for every block of samples. But as you already know I'd guess, there is
also the overhead of one load and one store *per sample*.

Admittedly, dubnium was not too clever about reusing sample buffers, so
probably spent more time on cache misses than nova would. Regardless, I
found that by rolling multiple processing units in to one, without any
memory buffers between them, performance improvement was huge. Really,
really huge.

As I mentioned, dubnium would generate C code provided by the processing
units then feed it to gcc. What happened basically is that each input
and output of each processing unit was declared as a local variable in
one function that would become the aggregation of some subgraph of
processing units. So, if you had a patch which would multiply 4 numbers
with a tree of "multiply" widgets with 2 inputs each, it would generate
code something like:

the graph:

  in1         in2   in3        in4
     \       /         \       /
      [mult1]           [mult2]
             \         /
               [mult3]
                  |
                 out

inline multiply(float in1, float in2, float *out) {
    // the body of this function is provided by the
    // processing unit implementation
    *out = in1 * in2;
}

aggregation(float *in1, float *in2, float *in3, float *in4, float *out)
{
    float
        mult1_in1, mult1_in2, mult1_out,
        mult2_in1, mult2_in2, mult2_out,
        mult3_in1, mult3_in2, mult3_out;

    for i from 0..buffer_size {
        mult1_in1 = in1[i];
        mult1_in2 = in2[i];

        multiply(mult1_in1, mult1_in2, *mult1_out);
        
        mult2_in1 = in3[i];
        mult2_in2 = in4[i];

        multiply(mult2_in1, mult2_in2, *mult2_out);
        
        mult3_in1 = mult1_out;
        mult3_in2 = mult2_out;

        multiply(mult3_in1, mult3_in2, *mult3_out);

        out[i] = mult3_out;
    }
}

gcc was smart enough to assemble this in to something really optimal,
doing the commuinication between the modules just by leaving values in
the registers, rather than writing them to memory only to load them back
again.

So as I say, it's hard for me to know if something of this nature is a
worthwhile optimization or not, partially because I never got much past
implementing a monophonic subtractive synth in dubnium, and because we
optimized in different ways. However, just wanted to throw the idea out
there for consideration.
_______________________________________________
nova-dev mailing list
[email protected]
http://klingt.org/cgi-bin/mailman/listinfo/nova-dev
http://tim.klingt.org/nova

Reply via email to