Re: [linux-audio-dev] LADSPA hard_rt_capable

David Olofson Thu, 14 Dec 2000 22:49:43 -0800
On Wednesday 13 December 2000 01:16, Jack O'Quin wrote:
> I've been thinking about the requirements of realtime audio,
> including the many interesting comments on this list.  Clearly, you
> folks understand it very well, but it's still easy to underestimate
> the complexity of realtime programming.  The main difficulty is
> that "realtime" is a global attribute of the entire system, which
> any component can mess up by being careless in subtle ways.
>
> So, I've come to the unwelcome conclusion that in many cases,
> realtime audio probably requires at least *two* different-priority
> threads. Maybe three.  And, that is not even considering SMP.
>
> In support of this idea, I note that Csound implements three
> different cyclical rates at which variables can change their
> values.  Here is an excerpt from the Csound manual:
>
> csound> There are four possible rates:
> csound>
> csound>      1) once only, at orchestra setup time (effectively a
> csound>       permanent assignment);
> csound>      2) once at the beginning of each note (at
> initialization csound>        (init) time:  I-rate);
> csound>      3) once every performance-time control loop (perf time
> csound>       control rate, or  K-rate);
> csound>      4) once each sound sample of every control loop (perf
> time csound>  audio rate, or  A-rate).
>
> Borrowing this terminology only for purposes of discourse, I
> observe that most existing Linux realtime audio seems to run at
> either I-rate (various MIDI control events, GUI commands), or
> A-rate (hard disk recording, mixing).  I'm speaking somewhat
> loosely, here.  There may be some plugins that would like to use
> something in between, like the Csound K-rate.  Perhaps MIDI
> continuous controllers are in this category.  I don't know. 
> Perhaps not.
>
> I have observed discussions between I-rate (not "irate")
> programmers and A-rate programmers about whether MIDI is really a
> realtime protocol.  Clearly, it is.  Responses at least in the low
> millisecond range are essential in many cases for keyboard control
> input, for example.  Some would probably argue for even faster
> response times. But, I believe that Ardour has to deal with much
> tighter response requirements.  Handling complex MIDI requests
> while simultaneously reading or writing 24 (or more) channels of
> digital audio to disk is a very difficult challenge.
>
> In many sophisticated realtime systems there is a hierarchy of
> priorities representing different guaranteed latencies.

Yes, but streaming audio systems don't quite work like that. 

Actually, no RT systems do - all you achieve with more threads and 
more priority levels is more complexity - and *possibly* the ability 
to solve *some* problems better *most of the time*. Basically, big 
thread hierachies are the way to practically non-deterministic code.


>  Code runs
> at a high priority when it needs to respond with low latency. 

Yes, but guess who gets to figure out how to guarantee that certain 
sequences of external events with high internal priorities don't 
cause lower priority threads to miss deadlines...? ;-)

And, keeping this audio related; the audio thread is usually higher 
latency, lower priority, but absolutely *hard* real time! Don't 
*ever* make the audio thread finish late, or you'll get audio 
drop-outs.

The alternative is to give MIDI event processing (not necessarilly 
MIDI capturing and timestamping!) lower priority, basically giving 
MIDI higher worst case latency, in order to avoid having to figure 
out how many MIDI events you may process in a certain amount of time, 
and what to do if you run out of time...


Alternatively, you can just calculate the simplified absolute worst 
case scenario (ie all threads want to do their work at the "wrong" 
time), and set the CPU time margin according to that. That's 
relatively simple, and guaranteed to work, as long as you get the 
maths and the code right. No hidden complexity. (Oh, *do* remember to 
make sure that you take the maximum input event density [ie MIDI 
speed] in account! It might make your calculations look very 
pessimistic, but it really doesn't get better than that if you really 
need to be safe...)


> This, in turn, places a requirement on every program running at or
> above that priority to keep its worst-case pathlength within some
> constrained limit.  Other realtime programs, running at lower
> priorities, can be allowed much longer pathlengths because their
> guaranteed response is not so tight,

"Tight" could be confusing in this context - actually, in the 
softsynth example, the lower prio thread (audio) has a *longer* code 
path per cycle, and higher acceptable latency, BUT it has a much 
harder deadline than the MIDI thread. It just *mustn't* be missed, 
while it's not the end of the world if a MIDI event should be 
slightly late.


> and because their lower
> priority allows them to run without affecting higher-priority
> activities.
> 
> I like to visualize this phenomenon as a "realtime response
> pyramid". High-priority tasks at the top of the pyramid must
> execute in a short time, symbolized by the pyramid's narrow width
> near its peak. Low-priority tasks nearer the pyramid's base, can
> run much longer, if necessary.

Yes, but don't forget that you also have to consider how the top of 
the pyramid affects the lower sections... Especially since we have 
this special case that the lower priority threads are hard real time, 
while the higher prio ones are not as hard - they shold have lower 
average latency, but higher worst case latencies can be accepted.


Fortunately, we have an advantage over "normal" Rt systems in that 
most of the data that requires heavy processing is array based, and 
allows processing to be chopped up into suitably sized portions, even 
on the fly. Thus, we can usually get away with defining a maximum 
acceptable latency (say 3 ms) a maximum acceptable average event 
response jitter (1 ms), and a maximum response latency (3 ms). Now, 
just make sure the whole system runs at least one cycle per "max 
average response jitter unit" (1 ms), set up buffering to get the 
desired maximum latency, and then make sure that there are no latency 
peaks that cause buffer underruns.

In softsynth terms; use a buffer size that results in <1 ms playback 
time per buffer, use 3 buffers total, check MIDI once per buffer, and 
make sure you don't get buffer underruns.

If you want better timing accuracy for the MIDI events, you need to 
check MIDI in a separate thread running at higher priority than the 
audio thread, and timestamp events there.

If you need lower MIDI in -> audio out latency, you can eliminate 
approximately one buffer by lowering the CPU load to virtually zero - 
or by switching to RTLinux or RTAI, to practically eliminate the 
scheduling latency peaks.


> On Mon, 11 Dec 2000, Steve Harris wrote:
> > > I've a few plugins that can't be marked as HARD_RT_CAPABLE
> > > because thier cycle consumption varies too much when you change
> > > parameters (ie. they use parameter watching and only rebuild
> > > tables if they need to) otherwise they would be too slow with
> > > small chunk sizes. But this means that they can't be flagged as
> > > RT_CAPABLE, even though they don't use malloc or anything nasty
> > > like that.
> > >
> > > Is thier any advantage to allowing them to flag that they have
> > >unpredictable CPU consumption, but are otherwise safe? I'm not
> > > sure if that helps the host.
>
> Using Csound terminology, the problem Steve describes is a case of
> A-rate code being modulated due to I-rate (or maybe K-rate) events.
> This makes the single HARD_RT_CAPABLE flag of LADSPA seem overly
> simplified.
>
> I realize there's probably nothing LADSPA, itself, can do about
> this. But, it seems that relatively sophisticated hosts may wish to
> implement the realtime response pyramid I've been describing. 
> Perhaps some already do.
>
> For this, it would be helpful for plugins to describe their
> realtime properties in terms of priority classifications.  Maybe,
> Steve could mark his plugin as A_RATE_CAPABLE, but with a parameter
> modification routine that is only I_RATE_CAPABLE, for example. 
> Perhaps that should be handled by creating a separate, but related
> plugin.  I don't know.

That is, classify plugins after their balance between how much the 
buffer size and individual events affect the execution time...? (What 
I mean is, "high quality RT" plugins would have very similar 
execution times no matter what, whereas "lower quality RT" plugins 
have higher peaks, and thus don't work in threads with very low 
I/Olatency.)

Yes, that make sense, but it can't be done on Linux/lowlatency, 
unless there's a *big* difference between the classes; ie 2 ms for 
the "high quality" thread and 20 ms for the "low quality" thread. Due 
to the CPU load, lack of timesharing and the high worst case 
latencies, smaller differences will only lead to complex interference 
phenomena that would make it very hard to guarantee that the lower 
priority thread can meet it's deadlines.

Now, obviously, this is - at least in theory - different on SMP 
systems, where the threads don't necessarilly compete for CPU time 
when they want to run at the same time...


> Benno Senoner <[EMAIL PROTECTED]> replied:
> > Hi, yes these plugins are a bit a dilemma, especially when it
> > comes to small block sizes.  How does one know for example that a
> > plug can work with a block size of X on a CPU with Y Mhz during
> > worst case scenarios ?  (user/audiomation software varies
> > parameters like mad (and thus causing recalculation of tables
> > during each run() cycle) In LADSPA it will probably not make
> > sense, but on MAIA we could implement a system where these
> > calculations are performed in a lower priority thread and where
> > the results are delivered in an atomic way to the run() callback.
> >  Otherwise [snip...]
>
> Benno seems to be thinking along similar lines.  I don't know
> enough to comment on his MAIA and LADSPA comparison, but I agree
> with his idea of recalculating tables in a lower priority thread.

Also not that such recalculation *can* be considered firm or even 
soft RT. If the *inherently* are non-deterministic, this separation 
would actually mean that "hard RT broken" plugins become "hard RT 
plugins with soft RT event response." The latter would be usable for 
all kinds of RT audio stuff, while the former would simply be useless 
for serious RT work, as they could cause audio drop-outs.


> Essentially, I'm arguing that the extra complexity of describing a
> few distinct realtime priority classifications is worthwhile,
> because they actually simplify the truly difficult issue in
> realtime systems: spelling out very clearly what the realtime
> characteristics must be for every component.

I think it would be more interesting (and useful) to separate the 
parts of plugins, and then use the classification on the *parts*, 
rather than a fixed assembly of parts. (Which is currently everything 
in run(), and some other things.)

Fundamental parts:
        Instantiation
        Entering "standby" mode
        run()
        Responding to certain events/control changes
        Exiting "standby" mode
        Destruction

Now, the classification is still interesting, but I'm not sure 
there's much practical use for something very detailed right now. 
(Linux/lowlatency scheduling latency, high CPU load in audio threads 
etc.) OTOH, it's better to have some extra info than missing info! :-)

Anyway, I'm thinking about a two dimensional system:

        RT class:       None, Soft, Firm, Hard
        Scalability:    None, Low, Exact, High

An RT class on "None" means that the operation could take "ages" - 
the plugin might load a file, do some raytracing or whatever.

"Soft" means that the operation is usually performed in an amount of 
time that a user clicking a button with the mouse would perceive as 
"almost zero latency", ie a few ms; at most 100 ms or so. Most stuff 
will probably go here.

"Firm" is like "Soft", but the upper limit is somewhere around the 
time it takes an "ultra low latency" engine to complete one buffer 
cycle, ie around 1 ms.

"Hard" means that the operation takes at most as long as it takes on 
average to process one sample with the plugin. This is the only class 
of operations that can safely be used inside the audio thread of a 
lowlatency RT application - but you still have to take care not to 
overload the CPU, as not even these operations take zero time!

As for the Scalability axis:

"None" - this takes the same amount of time no matter what, ie it 
doesn't scale with buffer size.

"Low" means that the operation takes less time for smaller buffers, 
but not half the time for half the buffer size.

"Exact" indicates that the operation scales practically linear with 
buffer size.

"High" means that halving the buffer size results in *less* than half 
the time for carrying out the operation.


Now, perhaps one should use figures instead of the Scalability 
classes, and perhaps even more factors - but how many plugin hackers 
would care to calculate all that properly...?


> This is not a fully thought out proposal, just some ideas I've been
> kicking around.  I'm open to comments and criticism.

Some more ideas above. Don't know if I'm making any sense at all...


//David

.- M A I A -------------------------------------------------.
|      Multimedia Application Integration Architecture      |
| A Free/Open Source Plugin API for Professional Multimedia |
`----------------------> http://www.linuxaudiodev.com/maia -'
.- David Olofson -------------------------------------------.
| Audio Hacker - Open Source Advocate - Singer - Songwriter |
`--------------------------------------> [EMAIL PROTECTED] -'
Re: [linux-audio-dev] LADSPA hard_rt_capable

Reply via email to