Re: [music-dsp] Sliding Phase Vocoder (was FIR blog post & interactive demo)

Andreas Gustafsson Mon, 13 Apr 2020 13:55:51 -0700

Hello Spencer,

You wrote:
> A while ago I read through some the literature [1] on implementing
> an invertible CQT as a special case of the Nonstationary Gabor
> Transform. It's implemented by the essentia library [2] among other
> places probably.
> 
> The main idea is that you take the FFT of your whole signal, then
> apply the filter bank in the frequency domain (just
> multiplication). Then you IFFT each filtered signal, which gives you
> the time-domain samples for each band of the filter bank. Each
> frequency-domain filter has a different bandwidth, so your IFFT is a
> different length for each one, which gives you the different sample
> rates for each one.

That's the basic idea, but the Gaborator rounds up each of the
per-band sample rates to the original sample rate divided by some
power of two.  This means all the FFT sizes can be powers of two,
which tend to be faster than arbitrary sizes.  It also results in a
nicely regular time-frequency sampling grid where many of the samples
coincide in time, as shown in the second plot on this page:

  https://www.gaborator.com/gaborator-1.4/doc/overview.html

Also, the Gaborator makes use of multirate processing where the signal
is repeatedly decimated by 2 and the calculations for the lower
octaves run at successively lower sample rates.  These optimizations
help the Gaborator achieve a performance of millions of samples per
second per CPU core.

> They also give an "online" version where you do
> the processing in chunks, but really for this to work I think you'd
> need large-ish chunks so the latency would be pretty bad.

The Gaborator also works in chunks.  A typical chunk size might be
8192 samples, but thanks to the multirate processing, in the lowest
frequency bands, each of those 8192 samples may represent the
low-frequency content of something like 1024 samples of the original
signal.  This gives an effective chunk size of some 8 million samples
without actually having to perform any FFTs that large.

Latency is certainly high, but I would not say it is a consequence of
the chunk size as such.  Rather, both the high latency and the need
for a large (effective) chunk size are consequences of the lengths of
the band filter impulse responses, which get exponentially larger as
the constant-Q bands get narrower towards lower frequencies.

Latency in the Gaborator is discussed in more detail here:

  https://www.gaborator.com/gaborator-1.4/doc/realtime.html

> The whole process is in some ways dual to the usual STFT process,
> where we first window and then FFT. in the NSGT you first FFT and
> then window, and then IFFT each band to get a Time-Frequency
> representation.

Yes.

> For resynthesis you end up with a similar window overlap constraint
> as in STFT, except now the windows are in the frequency domain. It's
> a little more complicated because the window centers aren't
> evenly-spaced, so creating COLA windows is complicated. There are
> some fancier approaches to designing a set of synthesis windows that
> are complementary (inverse) of the analysis windows, which is what
> the frame-theory folks like that Austrian group seem to like to use.

The Gaborator was inspired by the papers from that Austrian group and
uses complementary resynthesis windows, or "duals" as frame theorists
like to call them.  The analysis windows are Gaussian, and the dual
windows used for resynthesis end up being slightly distorted
Gaussians.

> One of the nice things about the NSGT is it lets you be really
> flexible in your filterbank design while still giving you
> invertibility.

Agreed.

In a later message, you wrote:
> Whoops, just clicked through to the documentation and it looks like
> this is the track you're on also. I'm curious if you have any
> insight into the window-selection for the analysis and synthesis
> process. It seems like the NSGT framework forces you to be a bit
> smarter with windows than just sticking to COLA, but the dual frame
> techniques should apply for regular STFT processing, right?

I'm actually not that familiar with traditional STFTs and COLA, but as
far as I can tell, the STFT is a special case of the NSGT and the same
dual frame techniques should apply.
-- 
Andreas Gustafsson, g...@waxingwave.com
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] Sliding Phase Vocoder (was FIR blog post & interactive demo)

Reply via email to