Re: [nova-dev] Performance ideas

Tim Blechmann Wed, 26 Sep 2007 10:19:47 -0700

> For a while I played around with writing a synth software called Dubnium
> <http://bitglue.com/dubnium>. I never developed it enough to actually be
> useful, but I did learn some things, anyway.
> 
> I think the most original thing about dubnium was that the processing
> units were not functions which took pointers to input and output buffers
> and did their magic, but instead were chunks of C code which would be
> inlined with the code from other modules, then compiled with gcc.


i think i see your point ... the problem with this approach is, that you
have problems when trying to dynamically change the dsp graph ... 

so if you add two signals, and then do apply a filter, the most
efficient way would probably be something like:
- get input1 and input2 from memory to registers
- add them
- apply the filter in the registers
- move the result from registers to memory ...

the less efficient way, which can be used in dynamic environments
(without a compile cycle) is something like:
- get input1 and input2 from memory to registers
- add them
- move the result from registers to memory
- get the result from memory to registers
- apply the filters
- move the result back to the memory

when written as a nova patch, the second approach is the only usable ...

however there are some other aspects that one would have to keep in
mind. 
- adding or multiplying two sample vectors can be done in steps of 4
samples, while filtering is a sample-wise operation
- if you start to have too many local variables so that they don't fit
to your registers any more, they will be stored to the stack, and
instead of saving memory operations, you end up, having even more ...


> As I mentioned, dubnium would generate C code provided by the processing
> units then feed it to gcc. What happened basically is that each input
> and output of each processing unit was declared as a local variable in
> one function that would become the aggregation of some subgraph of
> processing units. So, if you had a patch which would multiply 4 numbers
> with a tree of "multiply" widgets with 2 inputs each, it would generate
> code something like:
> 
> the graph:
> 
>   in1         in2   in3        in4
>      \       /         \       /
>       [mult1]           [mult2]
>              \         /
>                [mult3]
>                   |
>                  out
> 
> inline multiply(float in1, float in2, float *out) {
>     // the body of this function is provided by the
>     // processing unit implementation
>     *out = in1 * in2;
> }

actually i am using a similar approach for implicit ugens, that add
memory chunks, when you connect multiple signal outlets to one signal
inlet (see: source/kernel/ugen/add_ugen.hpp) ...
however this tree structure you described is only very efficient, when
your data is always located in the registers. for my Add_Ugen class, the
maximum number of signal vectors that i add in one loop is 4, because of
the number of floating point registers on the sse unit of x86 cpus.

i am not really sure, how a cross-ugen optimization could be realized,
from a technical point of view, as it is dependent on the architecture
(how many registers the cpu provides), the algorithm complexity (how
many registers your algorithm needs) and the algorithm type
(vectorizable or not).
i somehow prefer to have efficient ugen implementations, and
cache-friendly code ...

cheers, tim

--
[EMAIL PROTECTED]    ICQ: 96771783
http://tim.klingt.org

Which is more musical, a truck passing by a factory or a truck passing
by a music school?
  John Cage

signature.asc
Description: This is a digitally signed message part

_______________________________________________
nova-dev mailing list
[email protected]
http://klingt.org/cgi-bin/mailman/listinfo/nova-dev
http://tim.klingt.org/nova

Re: [nova-dev] Performance ideas

Reply via email to