Well, I finally some time to read your thesis. The thing I really didn't like about all the modular synth systems I have tried (Pd, Csound, Om, probably more...) is that performance sucked.
I see that you put a good deal of effort in to reusing signal buffers for best usage of memory cache, which is very cool. When I talk to some developers of other projects, they just don't get it. For a while I played around with writing a synth software called Dubnium <http://bitglue.com/dubnium>. I never developed it enough to actually be useful, but I did learn some things, anyway. I think the most original thing about dubnium was that the processing units were not functions which took pointers to input and output buffers and did their magic, but instead were chunks of C code which would be inlined with the code from other modules, then compiled with gcc. My reasoning for this was simple: many of the processing units in synths are very simple. Maybe they just multiply two inputs to get the output, or maybe they are only marginally more complex like a simple 1 pole IIR. Most implementations might go something like this (in some sort of twisted pseudocode): multiply(float *in1, float *in2, float *out) { for i in 0..buffer_size { out[i] = in1[i] * in2[i] } } Most people look at this and figure the overhead, or the cpu time spent doing something other than the multiplication, is just one function call for every block of samples. But as you already know I'd guess, there is also the overhead of one load and one store *per sample*. Admittedly, dubnium was not too clever about reusing sample buffers, so probably spent more time on cache misses than nova would. Regardless, I found that by rolling multiple processing units in to one, without any memory buffers between them, performance improvement was huge. Really, really huge. As I mentioned, dubnium would generate C code provided by the processing units then feed it to gcc. What happened basically is that each input and output of each processing unit was declared as a local variable in one function that would become the aggregation of some subgraph of processing units. So, if you had a patch which would multiply 4 numbers with a tree of "multiply" widgets with 2 inputs each, it would generate code something like: the graph: in1 in2 in3 in4 \ / \ / [mult1] [mult2] \ / [mult3] | out inline multiply(float in1, float in2, float *out) { // the body of this function is provided by the // processing unit implementation *out = in1 * in2; } aggregation(float *in1, float *in2, float *in3, float *in4, float *out) { float mult1_in1, mult1_in2, mult1_out, mult2_in1, mult2_in2, mult2_out, mult3_in1, mult3_in2, mult3_out; for i from 0..buffer_size { mult1_in1 = in1[i]; mult1_in2 = in2[i]; multiply(mult1_in1, mult1_in2, *mult1_out); mult2_in1 = in3[i]; mult2_in2 = in4[i]; multiply(mult2_in1, mult2_in2, *mult2_out); mult3_in1 = mult1_out; mult3_in2 = mult2_out; multiply(mult3_in1, mult3_in2, *mult3_out); out[i] = mult3_out; } } gcc was smart enough to assemble this in to something really optimal, doing the commuinication between the modules just by leaving values in the registers, rather than writing them to memory only to load them back again. So as I say, it's hard for me to know if something of this nature is a worthwhile optimization or not, partially because I never got much past implementing a monophonic subtractive synth in dubnium, and because we optimized in different ways. However, just wanted to throw the idea out there for consideration. _______________________________________________ nova-dev mailing list [email protected] http://klingt.org/cgi-bin/mailman/listinfo/nova-dev http://tim.klingt.org/nova
