On 2016-08-07 14:28, Nicolas George wrote:

You can not compute the spectrum of a single sample, that does not make
sense mathematically. The spectrum needs to be computed on the whole stream,
or at least, if you want to observe how it evolves during time, over a
window large enough.

Then it's my mistake. I'm explaining it wrong - sorry for that, and allow me to rephrase.

I start with a mono audio file containing one song - a few minutes of audio. Let's say the "quality" here is arbitrarily high, for simplicity.

Using python/numpy or some other tools, I calculate the spectrum of the whole song, either all at once if possible, or using a reasonably large, shifting time window.

I store that spectrum in a matrix. In the time dimension, the matrix has T rows per second (depends on the length of the song). In the frequency dimension, the matrix has F rows (frequency buckets or bins). In each cell, I store one value using B bits (the color of the waterfall, or the height of the 3D representation of the spectrum).

I then convert the matrix back into a PCM representation.

I need to determine the matrix parameters T, F, and B, so that the final PCM file has about as much information (about the same "sound quality", however you want to define that) as if it was extracted from an MP3 file, 44.1 kHz, 128 kbps CBR.

I understand that the frequency bins do not have constant width, but rather their upper/lower frequency limits have constant ratio (similar to octaves on a keyboard, but different ratio here).

The purpose of this whole exercise is to run some computations on the full spectrum (the matrix). I need to minimize the size of the matrix, while keeping the time and frequency resolutions pretty decent. I've decided that the "sound quality" of MP3 / 44.1 / 128 CBR is good enough, so I'm trying to imitate those respective resolutions, as used by MP3.

I suspect the MP3 encoding algorithm is more complex than using a fixed size matrix, so I'm only asking for a rough approximation, like a back of the envelope estimate. How many rows per second, how many frequency buckets, how many bits per cell, so that the result is not worse than that reference MP3/44.1/128 file? It doesn't have to be the exact same signal degradation, but if it's subjectively close then that's enough for me.

--
Florin Andrei
http://florin.myip.org/
_______________________________________________
ffmpeg-user mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Reply via email to