Re: [music-dsp] Supervised DSP architectures (vs. push/pull)

Evan Balster Mon, 01 Aug 2016 10:30:12 -0700

Here's my current thinking.  Based on my current and foreseeable future
use-cases, I see just a few conditions that would play into automatic
prioritization:


   - (A) Does the DSP depend on a real-time input?
   - (B) Does the DSP factor into a real-time output?
   - (C) Does the DSP produce side-effects?  (EG. observers, sends to
   application thread)

Any chain of effects with exactly one input and one output could be grouped
into a single task with the same priority.  Junction points whose sole
input or sole output is such a chain could also be part of it.

This would yield a selection of DSP jobs which would be, by default,
prioritized thus:

   1. A+B+C
   2. A+B
   3. A+C
   4. B+C
   5. B
   6. C

Any DSPs which do not factor into real-time output or side-effects could
potentially be skipped (though it's worth considering that DSPs will
usually have state which we may want updating).

It is possible that certain use-cases may favor quick completion of
real-time processing over latency of observer data.  In that case, the
following scheme could be used instead:

   1. A+B (and A+B+C)
   2. B+C
   3. B
   4. A+C
   5. C

(Where steps 4 and 5 may occur after the callback has been satisfied)

To make the prioritization more flexible, individual DSPs could be assigned
priority values above, below or between the automatic ones.  The priority
of a chain would be the priority of its most essential element, and chains
whose inputs have not yet been computed could be withheld from the priority
queue until such time as they are ready for processing.

Note that I haven't made much consideration toward scratch memory in the
scheme above.  The reason for this is that, as I'm realizing, the temporary
memory needs of even a complex DSP tree are small compared to permanent
memory such as samples, delay lines, et cetera.  As I'm learning recently,
cache coherency gains from memory re-use are typically most relevant within
small pieces of code.  If anyone has evidence to the contrary, though, I'd
love to see it.

– Evan Balster
creator of imitone <http://imitone.com>

On Sun, Jul 31, 2016 at 3:55 PM, Ethan Fenn <et...@polyspectral.com> wrote:

> A few years ago I built a mixing engine for games. Some aspects of the
> design sound similar to what you're thinking about.
>
> Every audio frame (I think it was every 256 samples at 48k), the
> single-threaded "supervisor" would wake up and scan the graph of audio
> objects, figuring out what needed doing and what the dependencies were. As
> its output it produced a vector of "job" structures, describing the dsp to
> take place, where to pull the input and settings from and where to write
> the output. Then the pool of worker threads would wake up, pluck jobs one
> by one from the front of array and get to work on them.
>
> I did it this way partly because one of the target platforms was the PS3
> with the Cell processor, and this was the preferred programming model for
> using the SPE's, the auxiliary computation cores of the Cell. But it mapped
> just fine onto other platforms.
>
> I didn't have any notion of "priority" -- everything in the graph just
> needed to get done every frame. My synchronization model was also pretty
> crude. When there were any dependencies on previous steps, I would insert a
> fence into the job vector, and workers reaching the fence would just wait
> until all the previous jobs were complete before moving on. Not the most
> flexible structure for sure, but I could get away with it because my mixing
> graphs were wide rather than deep -- a lot of sound effects playing at
> once, but only very few mix/effect buses. So all of the sound effect
> processing would get parallelized perfectly, which was really all I needed
> for my needs.
>
> This kind of model is too simple for lots of applications, and it may not
> do what you need. But I'd strongly recommend thinking about what your
> typical graphs are going to look like and about what the simplest possible
> supervisor would be that would handle that kind of graph well. Writing a
> perfect general scheduler is something you could easily spend 100% of your
> time doing if you wanted to, and you probably don't!
>
> -Ethan
>
>
> On Thu, Jul 28, 2016 at 4:46 PM, Evan Balster <e...@imitone.com> wrote:
>
>> Haha, Ross, I'm not sure I'll be going *quite* so deep just yet.
>>
>> My most pressing need is simply to access more processing power than one
>> callback will give me (without underflow).  To that end, I'll be setting up
>> a signaling system whereby one stream can have "helper threads" that are
>> notified when new input is available and do their best to keep output
>> available.  For the first implementation I'll permit the slave thread's
>> output to have higher latency...  Easy job.
>>
>> From there, I'm interested in starting with a single-threaded supervisor
>> which can break the processing graph into chunks and run them according to
>> a simple prioritization scheme* while keeping the amount of scratch-memory
>> used within reasonable bounds.  In my current system, most DSPs just
>> operate on their output buffer, and a scratch-memory stack is made
>> available for any temporary allocations in the rendering tree...  That will
>> need to change, though.
>>
>> Later on, though, I may very well investigate a supervisor with
>> multi-core capabilities.  (I certainly want to get a better grip on
>> multi-threaded scheduling for purposes outside DSP.)  I've relied thus far
>> on a small number of handy lock-free abstractions** to synchronize state in
>> my audio framework, but for things like worker threads I want to get a grip
>> on the practicalities of using things like condition variables in a
>> low-latency DSP system.
>>
>> – Evan Balster
>> creator of imitone <http://imitone.com>
>>
>> * I expect a "simple prioritization scheme" to prioritize different parts
>> of the graph depending on whether their inputs and/or outputs lead to
>> real-time or non-real-time sources or sinks.  For instance, a microphone
>> level metric might be quite high, while a synthesizer that feeds into a
>> recorder (and not the speakers) would be very low.
>>
>>
>> On Thu, Jul 28, 2016 at 12:20 AM, Ross Bencina <
>> rossb-li...@audiomulch.com> wrote:
>>
>>> Hi Evan,
>>>
>>> Greetings from my little cave deep in the multi-core scheduling rabbit
>>> hole! If multi-core is part of the plan, you may find that multicore
>>> scheduling issues dominate the architecture. Here are a couple of starting
>>> points:
>>>
>>> Letz, Stephane; Fober, Dominique; Orlarey, Yann; P.Davis,
>>> "Jack Audio Server: MacOSX port and multi-processor version"
>>> Proceedings of the first Sound and Music Computing conference – SMC’04,
>>> pp. 177–183, 2004.
>>> http://www.grame.fr/ressources/publications/SMC-2004-033.pdf
>>>
>>> CppCon 2015: Pablo Halpern “Work Stealing"
>>> https://www.youtube.com/watch?v=iLHNF7SgVN4
>>>
>>> Re: prioritization. Whether the goal is lowest latency or highest
>>> throughput, the solutions come under the category of Job Shop Scheduling
>>> Problems. Large classes of multi-worker multi-job-cost scheduling problems
>>> are NP-complete. I don't know where your particular problem sits. The Work
>>> Stealing schedulers seem to be a popular procedure, but I'm not sure about
>>> optimal heuristics for selection of work when there are multiple possible
>>> tasks to select -- it's further complicated by imperfect information about
>>> task cost (maybe the tasks have unpredictable run time), inter-core
>>> communication costs etc.
>>>
>>> Re: scratch storage allocation. For a single-core single-graph scenario
>>> you can use graph coloring (same as a compiler register allocator). For
>>> multi-core I guess you can do the same, but you might want to do something
>>> more dynamic. E.g. reuse a scratch buffer that is likely in the local CPUs
>>> cache.
>>>
>>> Cheers,
>>>
>>> Ross.
>>>
>>>
>>>
>>> On 28/07/2016 5:38 AM, Evan Balster wrote:
>>>
>>>> Hello ---
>>>>
>>>> Some months ago on this list, Ross Bencina remarked about three
>>>> prevailing "structures" for DSP systems:  Push, pull and *supervised
>>>> architectures*.  This got some wheels turning, and lately I've been
>>>> confronted by the need to squeeze more performance by adding multi-core
>>>> support to my audio framework.
>>>>
>>>> I'm looking for wisdom or reference material on how to implement a
>>>> supervised DSP architecture.
>>>>
>>>> While I have a fairly solid idea as to how I might go about it, there
>>>> are a few functions (such as prioritization and scratch-space
>>>> management) which I think are going to require some additional thought.
>>>>
>>>> _______________________________________________
>>> dupswapdrop: music-dsp mailing list
>>> music-dsp@music.columbia.edu
>>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>>
>>>
>>
>> _______________________________________________
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>
>
>
> _______________________________________________
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>

_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] Supervised DSP architectures (vs. push/pull)

Reply via email to