Re: [FFmpeg-user] resolution of the waterfall diagram of typical mp3 file

Florin Andrei Sun, 07 Aug 2016 15:49:06 -0700

On 2016-08-07 14:28, Nicolas George wrote:

You can not compute the spectrum of a single sample, that does not make
sense mathematically. The spectrum needs to be computed on the wholestream,
or at least, if you want to observe how it evolves during time, over a
window large enough.

Then it's my mistake. I'm explaining it wrong - sorry for that, andallow me to rephrase.

I start with a mono audio file containing one song - a few minutes ofaudio. Let's say the "quality" here is arbitrarily high, for simplicity.

Using python/numpy or some other tools, I calculate the spectrum of thewhole song, either all at once if possible, or using a reasonably large,shifting time window.

I store that spectrum in a matrix. In the time dimension, the matrix hasT rows per second (depends on the length of the song). In the frequencydimension, the matrix has F rows (frequency buckets or bins). In eachcell, I store one value using B bits (the color of the waterfall, or theheight of the 3D representation of the spectrum).


I then convert the matrix back into a PCM representation.

I need to determine the matrix parameters T, F, and B, so that the finalPCM file has about as much information (about the same "sound quality",however you want to define that) as if it was extracted from an MP3file, 44.1 kHz, 128 kbps CBR.

I understand that the frequency bins do not have constant width, butrather their upper/lower frequency limits have constant ratio (similarto octaves on a keyboard, but different ratio here).

The purpose of this whole exercise is to run some computations on thefull spectrum (the matrix). I need to minimize the size of the matrix,while keeping the time and frequency resolutions pretty decent. I'vedecided that the "sound quality" of MP3 / 44.1 / 128 CBR is good enough,so I'm trying to imitate those respective resolutions, as used by MP3.

I suspect the MP3 encoding algorithm is more complex than using a fixedsize matrix, so I'm only asking for a rough approximation, like a backof the envelope estimate. How many rows per second, how many frequencybuckets, how many bits per cell, so that the result is not worse thanthat reference MP3/44.1/128 file? It doesn't have to be the exact samesignal degradation, but if it's subjectively close then that's enoughfor me.


--
Florin Andrei
http://florin.myip.org/
_______________________________________________
ffmpeg-user mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
[email protected] with subject "unsubscribe".

Re: [FFmpeg-user] resolution of the waterfall diagram of typical mp3 file

Reply via email to