A few years ago I built a mixing engine for games. Some aspects of the
design sound similar to what you're thinking about.

Every audio frame (I think it was every 256 samples at 48k), the
single-threaded "supervisor" would wake up and scan the graph of audio
objects, figuring out what needed doing and what the dependencies were. As
its output it produced a vector of "job" structures, describing the dsp to
take place, where to pull the input and settings from and where to write
the output. Then the pool of worker threads would wake up, pluck jobs one
by one from the front of array and get to work on them.

I did it this way partly because one of the target platforms was the PS3
with the Cell processor, and this was the preferred programming model for
using the SPE's, the auxiliary computation cores of the Cell. But it mapped
just fine onto other platforms.

I didn't have any notion of "priority" -- everything in the graph just
needed to get done every frame. My synchronization model was also pretty
crude. When there were any dependencies on previous steps, I would insert a
fence into the job vector, and workers reaching the fence would just wait
until all the previous jobs were complete before moving on. Not the most
flexible structure for sure, but I could get away with it because my mixing
graphs were wide rather than deep -- a lot of sound effects playing at
once, but only very few mix/effect buses. So all of the sound effect
processing would get parallelized perfectly, which was really all I needed
for my needs.

This kind of model is too simple for lots of applications, and it may not
do what you need. But I'd strongly recommend thinking about what your
typical graphs are going to look like and about what the simplest possible
supervisor would be that would handle that kind of graph well. Writing a
perfect general scheduler is something you could easily spend 100% of your
time doing if you wanted to, and you probably don't!

-Ethan


On Thu, Jul 28, 2016 at 4:46 PM, Evan Balster <e...@imitone.com> wrote:

> Haha, Ross, I'm not sure I'll be going *quite* so deep just yet.
>
> My most pressing need is simply to access more processing power than one
> callback will give me (without underflow).  To that end, I'll be setting up
> a signaling system whereby one stream can have "helper threads" that are
> notified when new input is available and do their best to keep output
> available.  For the first implementation I'll permit the slave thread's
> output to have higher latency...  Easy job.
>
> From there, I'm interested in starting with a single-threaded supervisor
> which can break the processing graph into chunks and run them according to
> a simple prioritization scheme* while keeping the amount of scratch-memory
> used within reasonable bounds.  In my current system, most DSPs just
> operate on their output buffer, and a scratch-memory stack is made
> available for any temporary allocations in the rendering tree...  That will
> need to change, though.
>
> Later on, though, I may very well investigate a supervisor with multi-core
> capabilities.  (I certainly want to get a better grip on multi-threaded
> scheduling for purposes outside DSP.)  I've relied thus far on a small
> number of handy lock-free abstractions** to synchronize state in my audio
> framework, but for things like worker threads I want to get a grip on the
> practicalities of using things like condition variables in a low-latency
> DSP system.
>
> – Evan Balster
> creator of imitone <http://imitone.com>
>
> * I expect a "simple prioritization scheme" to prioritize different parts
> of the graph depending on whether their inputs and/or outputs lead to
> real-time or non-real-time sources or sinks.  For instance, a microphone
> level metric might be quite high, while a synthesizer that feeds into a
> recorder (and not the speakers) would be very low.
>
>
> On Thu, Jul 28, 2016 at 12:20 AM, Ross Bencina <rossb-li...@audiomulch.com
> > wrote:
>
>> Hi Evan,
>>
>> Greetings from my little cave deep in the multi-core scheduling rabbit
>> hole! If multi-core is part of the plan, you may find that multicore
>> scheduling issues dominate the architecture. Here are a couple of starting
>> points:
>>
>> Letz, Stephane; Fober, Dominique; Orlarey, Yann; P.Davis,
>> "Jack Audio Server: MacOSX port and multi-processor version"
>> Proceedings of the first Sound and Music Computing conference – SMC’04,
>> pp. 177–183, 2004.
>> http://www.grame.fr/ressources/publications/SMC-2004-033.pdf
>>
>> CppCon 2015: Pablo Halpern “Work Stealing"
>> https://www.youtube.com/watch?v=iLHNF7SgVN4
>>
>> Re: prioritization. Whether the goal is lowest latency or highest
>> throughput, the solutions come under the category of Job Shop Scheduling
>> Problems. Large classes of multi-worker multi-job-cost scheduling problems
>> are NP-complete. I don't know where your particular problem sits. The Work
>> Stealing schedulers seem to be a popular procedure, but I'm not sure about
>> optimal heuristics for selection of work when there are multiple possible
>> tasks to select -- it's further complicated by imperfect information about
>> task cost (maybe the tasks have unpredictable run time), inter-core
>> communication costs etc.
>>
>> Re: scratch storage allocation. For a single-core single-graph scenario
>> you can use graph coloring (same as a compiler register allocator). For
>> multi-core I guess you can do the same, but you might want to do something
>> more dynamic. E.g. reuse a scratch buffer that is likely in the local CPUs
>> cache.
>>
>> Cheers,
>>
>> Ross.
>>
>>
>>
>> On 28/07/2016 5:38 AM, Evan Balster wrote:
>>
>>> Hello ---
>>>
>>> Some months ago on this list, Ross Bencina remarked about three
>>> prevailing "structures" for DSP systems:  Push, pull and *supervised
>>> architectures*.  This got some wheels turning, and lately I've been
>>> confronted by the need to squeeze more performance by adding multi-core
>>> support to my audio framework.
>>>
>>> I'm looking for wisdom or reference material on how to implement a
>>> supervised DSP architecture.
>>>
>>> While I have a fairly solid idea as to how I might go about it, there
>>> are a few functions (such as prioritization and scratch-space
>>> management) which I think are going to require some additional thought.
>>>
>>> _______________________________________________
>> dupswapdrop: music-dsp mailing list
>> music-dsp@music.columbia.edu
>> https://lists.columbia.edu/mailman/listinfo/music-dsp
>>
>>
>
> _______________________________________________
> dupswapdrop: music-dsp mailing list
> music-dsp@music.columbia.edu
> https://lists.columbia.edu/mailman/listinfo/music-dsp
>
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to