A few years ago I built a mixing engine for games. Some aspects of the design sound similar to what you're thinking about.
Every audio frame (I think it was every 256 samples at 48k), the single-threaded "supervisor" would wake up and scan the graph of audio objects, figuring out what needed doing and what the dependencies were. As its output it produced a vector of "job" structures, describing the dsp to take place, where to pull the input and settings from and where to write the output. Then the pool of worker threads would wake up, pluck jobs one by one from the front of array and get to work on them. I did it this way partly because one of the target platforms was the PS3 with the Cell processor, and this was the preferred programming model for using the SPE's, the auxiliary computation cores of the Cell. But it mapped just fine onto other platforms. I didn't have any notion of "priority" -- everything in the graph just needed to get done every frame. My synchronization model was also pretty crude. When there were any dependencies on previous steps, I would insert a fence into the job vector, and workers reaching the fence would just wait until all the previous jobs were complete before moving on. Not the most flexible structure for sure, but I could get away with it because my mixing graphs were wide rather than deep -- a lot of sound effects playing at once, but only very few mix/effect buses. So all of the sound effect processing would get parallelized perfectly, which was really all I needed for my needs. This kind of model is too simple for lots of applications, and it may not do what you need. But I'd strongly recommend thinking about what your typical graphs are going to look like and about what the simplest possible supervisor would be that would handle that kind of graph well. Writing a perfect general scheduler is something you could easily spend 100% of your time doing if you wanted to, and you probably don't! -Ethan On Thu, Jul 28, 2016 at 4:46 PM, Evan Balster <e...@imitone.com> wrote: > Haha, Ross, I'm not sure I'll be going *quite* so deep just yet. > > My most pressing need is simply to access more processing power than one > callback will give me (without underflow). To that end, I'll be setting up > a signaling system whereby one stream can have "helper threads" that are > notified when new input is available and do their best to keep output > available. For the first implementation I'll permit the slave thread's > output to have higher latency... Easy job. > > From there, I'm interested in starting with a single-threaded supervisor > which can break the processing graph into chunks and run them according to > a simple prioritization scheme* while keeping the amount of scratch-memory > used within reasonable bounds. In my current system, most DSPs just > operate on their output buffer, and a scratch-memory stack is made > available for any temporary allocations in the rendering tree... That will > need to change, though. > > Later on, though, I may very well investigate a supervisor with multi-core > capabilities. (I certainly want to get a better grip on multi-threaded > scheduling for purposes outside DSP.) I've relied thus far on a small > number of handy lock-free abstractions** to synchronize state in my audio > framework, but for things like worker threads I want to get a grip on the > practicalities of using things like condition variables in a low-latency > DSP system. > > – Evan Balster > creator of imitone <http://imitone.com> > > * I expect a "simple prioritization scheme" to prioritize different parts > of the graph depending on whether their inputs and/or outputs lead to > real-time or non-real-time sources or sinks. For instance, a microphone > level metric might be quite high, while a synthesizer that feeds into a > recorder (and not the speakers) would be very low. > > > On Thu, Jul 28, 2016 at 12:20 AM, Ross Bencina <rossb-li...@audiomulch.com > > wrote: > >> Hi Evan, >> >> Greetings from my little cave deep in the multi-core scheduling rabbit >> hole! If multi-core is part of the plan, you may find that multicore >> scheduling issues dominate the architecture. Here are a couple of starting >> points: >> >> Letz, Stephane; Fober, Dominique; Orlarey, Yann; P.Davis, >> "Jack Audio Server: MacOSX port and multi-processor version" >> Proceedings of the first Sound and Music Computing conference – SMC’04, >> pp. 177–183, 2004. >> http://www.grame.fr/ressources/publications/SMC-2004-033.pdf >> >> CppCon 2015: Pablo Halpern “Work Stealing" >> https://www.youtube.com/watch?v=iLHNF7SgVN4 >> >> Re: prioritization. Whether the goal is lowest latency or highest >> throughput, the solutions come under the category of Job Shop Scheduling >> Problems. Large classes of multi-worker multi-job-cost scheduling problems >> are NP-complete. I don't know where your particular problem sits. The Work >> Stealing schedulers seem to be a popular procedure, but I'm not sure about >> optimal heuristics for selection of work when there are multiple possible >> tasks to select -- it's further complicated by imperfect information about >> task cost (maybe the tasks have unpredictable run time), inter-core >> communication costs etc. >> >> Re: scratch storage allocation. For a single-core single-graph scenario >> you can use graph coloring (same as a compiler register allocator). For >> multi-core I guess you can do the same, but you might want to do something >> more dynamic. E.g. reuse a scratch buffer that is likely in the local CPUs >> cache. >> >> Cheers, >> >> Ross. >> >> >> >> On 28/07/2016 5:38 AM, Evan Balster wrote: >> >>> Hello --- >>> >>> Some months ago on this list, Ross Bencina remarked about three >>> prevailing "structures" for DSP systems: Push, pull and *supervised >>> architectures*. This got some wheels turning, and lately I've been >>> confronted by the need to squeeze more performance by adding multi-core >>> support to my audio framework. >>> >>> I'm looking for wisdom or reference material on how to implement a >>> supervised DSP architecture. >>> >>> While I have a fairly solid idea as to how I might go about it, there >>> are a few functions (such as prioritization and scratch-space >>> management) which I think are going to require some additional thought. >>> >>> _______________________________________________ >> dupswapdrop: music-dsp mailing list >> music-dsp@music.columbia.edu >> https://lists.columbia.edu/mailman/listinfo/music-dsp >> >> > > _______________________________________________ > dupswapdrop: music-dsp mailing list > music-dsp@music.columbia.edu > https://lists.columbia.edu/mailman/listinfo/music-dsp >
_______________________________________________ dupswapdrop: music-dsp mailing list music-dsp@music.columbia.edu https://lists.columbia.edu/mailman/listinfo/music-dsp