Hi Kassen,

I don't unerstand exactly what fluxus is doing with the fft values, but I think what you describe as a variable curve is achieved in practice with either a filterbank or with smoothing or averaging of the fft bins, something that is used a lot in spectrograms, auditory related modeling and audio coding (like mp3) to omit redundant information and better approximate the human hearing. This can be done for example either with an 'engineering' style 1/3rd octave smoothing or even better with ERB smoothing (equivalent rectangular bandwidth) which approximates much better human hearing. I think it can be done in the following steps:

- get the fft of the buffer (N bins)
- throw away the upper N/2-1 bins as they are conjugate symmetric of the lower half - calculate the partitions of the bins in succesive bands according to ERBs (https://ccrma.stanford.edu/realsimple/aud_fb/Equivalent_Rectangular_Bandwidth_ERB.html) starting from the lowest bin, computing the ERB, finding te upper limit, then taking that as the lower frequency or the next band, and repeat till the nyquist frequency is reached. that should result in approximately 40 bands covering the full range
- average the squared bins in each band to get the energy of each band

I don't know C very well but if it would be useful I could draft some matlab example.

cheers,
akis

On 19/07/2011 22:16, Kassen wrote:
Dear list,

I've been thinking about the below section of audioCollector.cpp;

// seem to only have stuff in the lower half - something to do with nyquist?
float UsefulArea = m_BufferLength/2;

for (unsigned int n=0; n<m_NumBars; n++)
{
float Value = 0;

float f = n/(float)m_NumBars;
float t = (n+1)/(float)m_NumBars;
f*=f;
t*=t;
unsigned from = f*UsefulArea;
unsigned to = t*UsefulArea;

for (unsigned int i=from; i<=to; i++)
{
if (i<m_BufferLength)
{
Value += m_FFTBuffer[i];
}
}

if (Value<0) Value=-Value;
Value*=m_Gain;
m_FFTOutput[n]=((m_FFTOutput[n]*m_SmoothingBias)+Value*(1-m_SmoothingBias));
}

return m_FFTOutput;

This section maps the FFT bins to the "bars" that correspond to the different "gh" bands in Fluxus. The squaring that I highlighted seems to be a attempt to compensate for how the output of the FFT transform is linear; the whole of the second half of UsefulArea will refer to the top-most octave of our bandwith (likely from 11.25 to 22.5KHz, or from "quite high" to "beyond the hearing"). Without such compensation 8 of the default 16 gh bands would cover that octave, 4 would cover the next down, etc.

The issue is that squaring the numbers doesn't go far enough; the curve is logarithmic. Another issue is that the kind of curve we'd ideally get isn't the same for all numbers of bands; at 10 bands it's easy as we can map each to one of the octaves (give or take) between 20 and 20K Hz. On the other hand; when we get 512 bins, due to a FFT frame of 1024 samples then setting the requested number of bands to 512 should create a linear mapping of one band per FFT bin. Ideally the curve would crossfade between logarithmic and linear as the number of bands increases from a relatively low number to approaching the number of bins (having more bands than bins doesn't make much sense to me).

I'm not sure what kind of variable curve those concerns imply in practice, but I do think that currently too many (5 according to my calculations) of the 16 bands that we have by default are concerned with the top-most octave of our hearing. The highlighted lines are better than nothing under all realistic conditions that I can quickly think of, but I'd like to try coming up with a better plan.

Yours,
Kas.

Reply via email to