Hello,
I ran into this scaling problem when developing the Ensemble Oscillator
<https://4mscompany.com/enosc.php>, which sums 1..16 uncorrelated voices.
Steven's advice is sound, but in practice the 1/sqrt(N) factor is
unsatisfactory for low N, because the absence of clipping is a strong
requirement in this case. For instance if polyphony is 2 and you scale by
0.707, clipping will occur very often. One solution is to adjust this table
by hand for the lowest values, another is to reason probabilistically, and
compute the factor table so that clipping is consistently *unlikely*. How
unlikely can be as low as you want, for instance let's set it to 10e-5.
This is my original solution, I don't know how well-known it is, nor if
it's 100% correct but it "worked for me"™.
Suppose that your single voice signal has a rectangular PDF between -1 and
1. Now let's sum two of these voices: the PDF of the resulting signal is
the convolution of two such rectangles, i.e. a triangle centered around
zero spanning between -2 and 2. If we sum another voice on top of these
two, the PDF of the resulting signal will be the convolution of this
triangle with another rectangle, i.e. a smooth bump, symmetrical around
zero, that tapers off at -3 and 3. The more voices you add, the "bumpier"
this curve gets at zero, and the flatter the edges are at -N and N ; for
N=+inf you obviously get a nice Gaussian.
By definition the area under this curve from l>0 to +inf (\int_l^{+\infty}
PDF(x).dx) determines the probability for the signal to exceed value l.
Therefore we can compute the l such that this area is 10e-5, i.e. the
signal amplitude that makes clipping 10e-5-unlikely. 1/l is the factor
we're looking for. Note that since the area is small and flat, we can
approximate it to PDF(l). For N=2 (triangular PDF), the curve tapers off
steeply around -2 and 2, so the l will be very close to 2; but as N gets
bigger, the curve is flatter and flatter around -N and N, so l will be
smaller and smaller than N.
Here's a little Python that computes the table: (warning, untested code
pasted from my notes)
<<<
import numpy as np
size = 16
resolution = 512
threshold = 10e-5
u = np.ones(resolution)
v = u
factors = []
for i in range(size):
half = u[len(u)/2:]
half = np.append(half, 0.)
factor = np.argmax(half<threshold) / float(resolution/2)
factors.append(1.0 / factor)
u = np.convolve(u, v) / resolution
>>>
The resulting table gets closer and closer to sqrt(N) but the divergence
for small N is significant.
Hope this helps. I guess this approach is rather "pedestrian", so comments
are welcome!
-m
On Wed, Dec 16, 2020 at 10:03 AM Steven Cook <
[email protected]> wrote:
> Hi,
>
> I would suggest that you don't try changing the overall output level
> when individual voices play or stop playing, as that would introduce
> clicks. Instead, only change the level when the maximum number of voices
> is changed by the user.
>
> Multiple, uncorrelated voices sum according to sqrt(N), where N is the
> number of voices. So 16 voices need to be divided by 4. You'd be better
> off calculating 1/sqrt(N) and multiplying by that instead, as you'd save
> CPU resources, so 16 voices would be multiplied by 1/sqrt(16) = 0.25
>
> I've precalculated the scaling factors for 1 to 16 voices for greater
> efficiency, as follows:
>
> static const float VOICE_SCALE[] =
> {
> 1.0f,// 1
> 0.7071067811865475f,// 2
> 0.5773502691896258f,// 3
> 0.5f,// 4
> 0.4472135954999579f,// 5
> 0.4082482904638630f,// 6
> 0.3779644730092272f,// 7
> 0.3535533905932738f,// 8
> 0.3333333333333333f,// 9
> 0.3162277660168379f,// 10
> 0.3015113445777636f,// 11
> 0.2886751345948129f,// 12
> 0.2773500981126146f,// 13
> 0.2672612419124244f,// 14
> 0.2581988897471611f,// 15
> 0.25f// 16
> };
>
> ------ Original Message ------
> From: "stefano chiappa" <[email protected]>
> To: [email protected]
> Sent: 16/12/2020 08:15:06
> Subject: How to add voices
>
> >Hi all,
> >
> >I'm developing in my free time a Software Synthesizer just for pleasure
> >of programming a synth, the project is on GitHub:
> >
> >https://github.com/kernel255/synassembler
> >
> >I'm now able to produce sound with my application, I have a basic
> >oscillator, and I can receive MIDI events to play notes.
> >I have an issue: playing many notes together, signals adds and I have
> >some distorsions, the fix looks simple but maybe it is not. I have two
> >ideas:
> >
> >1. I establish the maximum polyphony and I divide the sum of signals by
> >this value. Indeed it is what happen in reality in a piano, organ
> >etc...but maybe this linear sum and divide can give a weak sound if I
> >play few notes
> >2. Maybe I could do the same stuff but put at the end some compression
> >in order to have stronger signals when there are few notes
> >
> >What do you think? Does anyone know what real digital synth does?
> >
> >Best Regards and thank you for your attention
> >Stefano
>