On Thu, Mar 12, 2020 at 9:35 PM robert bristow-johnson <
r...@audioimagination.com> wrote:

>  i am not always persuaded that the analysis window is preserved in the
> frequency-domain modification operation.


It definitely is *not* preserved under modification, generally.

The Perfect Reconstruction condition assumes that there is no modification
to the coefficients. It's just a basic guarantee that the filterbank is
actually able to reconstruct the signal to begin with. The details of the
windows/zero-padding determine exactly what happens to all of the block
processing artifacts when you modify things.

if it's a phase vocoder and you do the Miller Puckette thing and apply the
> same phase to a entire spectral peak, then supposedly the window shape is
> preserved on each sinusoidal component.


Even that is only approximate IIRC, in that it assumes well-separated
sinusoids or similar?

The larger point being that preserving window shape under modification is
an exceptional case that requires special handling.

for analysis, there might be other properties of the window that is more
> important than being complementary.
>

That's true enough: this isn't as crucial in analysis-only as it is for
synthesis. Although, I do consider Parseval to be pretty bedrock in terms
of DSP intuition, and would not want to introduce frame-rate modulations
into analysis without a clear reason (of which there are many good
examples, don't get me wrong).


> if, after analysis, i am modifying each Gaussian pulse and inverse DFT
> back to the time domain, i will have a Gaussian window effectively on the
> output frame.  by multiplying by a Hann window and dividing by the original
> Gaussian window, the result has a Hann window shape and that should be
> complementary in the overlap-add.
>

So, a relevant distinction here is whether an STFT filterbank uses matching
analysis and synthesis windows. The PR condition is that their product
obeys COLA.

In the vanilla case, the analysis and synthesis windows are constrained to
match (actually they're time-reversals of one another, but that only
matters for asymmetric windows). Then, the PR condition is COLA on the
square of the (common) window, and the appropriate window is of "square
root" type, such as cosine. This is a "balanced" design, in that the
analyzer and synthesizer play equal roles in the windowing.

Note that this matching constraint removes many degrees of freedom from the
window design. In general, for mismatched analysis and synthesis windows,
the PR condition is very "loose." For example, you can use literally
anything you want for the analysis window, provided the values are finite
and non-zero (negative is okay!). Then you can pick any COLA window, and
solve for the synthesis window as their ratio. In this way you can design
PR filterbanks with arbitrarily bad performance :P

So for the mismatched case, we need some additional design principle(s) to
drive the window designs. Offhand, there seem to be two notable approaches
to this. One is that rectangular "windows" are desired on the synthesis
side in order to accommodate zero-padding/fast convolution type operation.
Then, the analysis window is whatever COLA window you care to use for
analysis purposes. As discussed, this is only appropriate for when the
modification is constrained to be a length-K FIR kernel.

The other is like your Gaussian example where you want to use a particular
window for analysis/modification reasons, and then need to square that with
the PR condition on the synthesis side. The downside here is that the
resulting synthesis windows are not as well behaved in terms of suppressing
block processing artifacts. They tend to become heavy-shouldered, exhibit
regions of amplification, etc. This can be worth it, but only if you gain
enough from the analysis/modification properties.


> > Rectangular windows are a perfectly valid choice here, albeit one with
> poor sidelobe suppression.
>
> but it doesn't matter with overlap-add fast convolution.  somehow, the
> sidelobe effects come out in the wash, because we can insure (to finite
> precision) the correctness of the output with a time-domain analysis.
>

Right, the rectangular windows are not being used for spectral estimation
in the fast convolution context, so their spectral properties are
irrelevant. They just represent a time-domain accounting of what the
circular convolution is doing.


> so you're oversampling in the frequency domain because you're zero-padding
> in the time domain.
>

Correct, zero-padding in the time domain is equivalent to upsampling in the
frequency domain.


> > Note that this equivalence happens because we are adding an additional
> time-variant stage (zero-padding/raw OLA), to explicitly correct for the
> time-variant effects of the underlying DFT operation. This is the block
> processing analog of upsampling a scalar signal by K so that we can apply
> an order-K polynomial nonlinearity without aliasing.
>
> where is this polynomial nonlinearity?  and i am still not groking the
> upsampling.
>

There's no nonlinearity in STFT, I'm just making an analogy. We have some
process that we know will produce a finite amount of "out of band" output,
and so we "upsample" by exactly that amount to avoid aliasing. Just as a
polynomial nonlinearity requires frequency-domain headroom (upsampling), so
does a circular convolution require time-domain headroom (zero-padding).
It's the same basic engineering idea, just applied to different flavors of
aliasing (non-linearity vs time-variance).


> i think so, but i wonder if i am understanding your "upsampling" and
> zero-padding.  we're zero-padding in the time domain (making the length
> longer to the DFT length N).  the factor of length increase in the time
> domain is the factor of oversampling in the frequency domain.  a lot of
> people make the sub-optimal decision to simply pad zeros equal to length of
> the frame of audio.  then doubling the length would be like inserting
> oversampled frequency DFT bins between each of the "original" spectral
> points.
>

Right, so as I mentioned before there are two sources of oversampling
available. In a critically sampled design, the DFT size would equal the hop
size. This is the minimum required to represent the signal, and anything on
top of that represents redundancy in the filterbank representation (and so,
additional overhead). Of course, that implies there is no overlap, the
"windows" are rectangular, spectral analysis properties are poor, aliasing
is not suppressed in synthesis, etc.

So the first layer of oversampling is overlap/windowing. This is powerful
because more data goes into each transform, and the windows explicitly
attenuate edge effects and control the spectral properties. Still, this
represents redundant overhead relative to the minimum require to represent
the signal. The other downside of overlap is that it corresponds to
latency, and reduces time resolution of the transform representation.

(Generally, it is possible to critically sample in the overlapping case,
and this is where MDCT comes in. In this case time-domain aliasing occurs
even if you *don't* modify the coefficients, and you must design the system
to satisfy TDAC in addition to PR. But this is more for coding and less for
analysis/pvoc/filtering).

The less powerful form of oversampling is zero-padding. Since there is no
additional data going into the transforms, there is no latency cost or real
increase in frequency resolution (it just interpolates the DFT of the
non-zero-padded case). What this buys is processing margin for applying
modifications in the frequency domain. And it can be helpful for analysis
if you want increased frequency resolution for some estimation task.

Ethan
_______________________________________________
dupswapdrop: music-dsp mailing list
music-dsp@music.columbia.edu
https://lists.columbia.edu/mailman/listinfo/music-dsp

Reply via email to